Обсуждение: [jules@jellybean.co.uk: Performance on inserts]

Поиск
Список
Период
Сортировка

[jules@jellybean.co.uk: Performance on inserts]

От
Jules Bean
Дата:
Weird.  When I sent this 24 hours ago, it didn't go through to the
list, although I successfully sent a message to the list only an hour
or so before that.... as far as my mail-server can tell it went off in
the right direcion...


----- Forwarded message from Jules Bean <jules@jellybean.co.uk> -----

Date: Thu, 24 Aug 2000 11:04:25 +0100
From: Jules Bean <jules@jellybean.co.uk>
To: pgsql-hackers@postgresql.org
Subject: Performance on inserts
User-Agent: Mutt/1.2i

[Meta: Please advise if this is the wrong list.  I think, since this
observation relates to Pg internals, this might be the right one, but
feel free to move it back to -general if I'm wrong; I subscribe to
both in any case]

As some of you will have inferred if you've read my last couple of
posts, I'm working on a database with a structure like this (I'm
abstracting; I'm not unfortunately allowed to show you what my client
is really doing here)

A table called 'things' with two columns 'name' and 'category'. The
pair ('name','category') is a primary key.

There are ~ 10 000 000 rows in the table, and category takes values
from a more-or-less fixed set of ~1000 possibilities.  As previously
described the most popular category holds around half the rows, the
next most popular holds nearly half of those left, and most categories
occur very rarely indeed. The median is probably around 1000 (which is
less than the 10 000 you'd expect).

Anyhow, this question isn't about speeding up queries --- we already
have that in the histogram thread.  This question is about speeding up
inserts.

My standard configuration is to have a unique index on (name,category)
and a non-unique index on (category). The main table is ~ 2G on disk,
the index on (name,cat) is about the same size, the index on (cat) is
around 0.6G.

In this set-up inserts have dropped to the terrifyingly slow rate of
several hours per 10 000 insertions. This is not adequate to my needs,
I occasionally have to process 1 000 000 insertions or more!

I have several ideas for speeding this up at the SQL level (including
inserting into a temp table and then using INSERT ... SELECT to remove
the overhead of separate inserts) but that's not what I want to talk
about either...

What I did to day, which made a staggering difference, is dropping the
non-unique index on (category). Suddenly I can insert at approx 40 000
insertions per minute, which is fine for my needs!

So why is updating the huge (2G) unique index on (name,cat) not too
much of a problem, but updating the small (600M) non-unique index on
(cat) sufficient to increase speed by around two orders of magnitude?

A possible reason has occurred to me:

The real slow-down is noticeable when I'm doing a bulk insert where
all new rows belong to the most popular category.  I know that some
btree implementations don't behave particularly sanely with several
million rows in a single key.. is the btree implementation used too
slow in this case?

I haven't collected all the performance statistics I'd like to have,
due to external time pressures, but I can say that under the new
faster configuration, the insertion process is CPU bound, with disk
access far below the levels the machine is capable of.  If I have a chance
I'll collect these stats for the old method too.

Any ideas as to what's going on here appreciated (if not, perhaps it
will point you towards an optimisation you ought to make for 7.1)

Jules

----- End forwarded message -----

-- 
Jules Bean                          |        Any sufficiently advanced 
jules@debian.org                    |  technology is indistinguishable
jules@jellybean.co.uk               |               from a perl script


Re: Performance on inserts

От
Tom Lane
Дата:
Jules Bean <jules@jellybean.co.uk> writes:
> So why is updating the huge (2G) unique index on (name,cat) not too
> much of a problem, but updating the small (600M) non-unique index on
> (cat) sufficient to increase speed by around two orders of magnitude?

> The real slow-down is noticeable when I'm doing a bulk insert where
> all new rows belong to the most popular category.  I know that some
> btree implementations don't behave particularly sanely with several
> million rows in a single key.. is the btree implementation used too
> slow in this case?

Indeed, the btree code used in 7.0 and before is pretty awful in the
face of large numbers of equal keys.  I have improved it somewhat,
but it's still got a problem once there are enough equal keys to span
many index pages.  I just did this experiment:

create table fill(f1 text);
create index filli on fill(f1);
insert into fill values('foo');
insert into fill values('bar');
insert into fill values('baz');

insert into fill select * from fill;    -- repeat this many times

The runtime of the insert/select scales in a pretty horrid fashion:

# tuples inserted        6.5        current

1536                <1sec        <1sec
3072                1.56        <1sec
6144                3.70        1.84
12288                9.73        4.07
24576                93.26        38.72
49152                363.23        231.56

At the end of this process we have about 100 pages of index entries
for each of the three distinct key values, with about 330 items per page.
The fixes that I applied a month or two back improve behavior for
multiple equal keys within a page, but they do little for multiple
pages full of the same key value.  Here's the problem: the code for
INSERT starts by finding the first occurrence of the target key (using
btree descent, so that's fast enough).  But then it does this:
       /*        * If we will need to split the page to put the item here,        * check whether we can put the tuple
somewhereto the right,        * instead.  Keep scanning until we find enough free space or        * reach the last page
wherethe tuple can legally go.        */       while (PageGetFreeSpace(page) < itemsz &&
!P_RIGHTMOST(lpageop)&&              _bt_compare(rel, keysz, scankey, page, P_HIKEY) == 0)       {           /* step
rightone page */           BlockNumber        rblkno = lpageop->btpo_next;
 
           _bt_relbuf(rel, buf, BT_WRITE);           buf = _bt_getbuf(rel, rblkno, BT_WRITE);           page =
BufferGetPage(buf);          lpageop = (BTPageOpaque) PageGetSpecialPointer(page);       }
 

In other words, we do a linear scan over all the pages containing equal
key values, in the hope of finding one where we can shoehorn in the new
entry.  If the keys are roughly equal-sized, that hope is usually vain,
and we end up splitting the rightmost page that contains any keys
equal to the target key.  Subsequent inserts of the same key value fill
first the left-half split page, then the right-half, then split the
right-half page again and repeat --- but for each insert we first had
to scan over all the preceding pages containing equal keys.  That means
inserting N equal keys takes O(N^2) time, for large enough N.

After thinking about this for a while, I propose a probabilistic
solution.  Suppose that we add to the above loop a random-number check
that causes us to fall out of the loop with probability 1/3 or so;
that is, if the current page is full of the target key, then we move
right to check the next page with probability 2/3, but with probability
1/3 we give up and split the current page to insert the new key.

With this approach we'd hardly ever move more than, say, 10 pages before
splitting, so the time to insert a new key is bounded no matter how many
duplicates it has.

The reason for using a random decision is that if we always stopped
after scanning a fixed number of pages, then the space freed up by the
previous decision to split would always be just out of reach, and we'd
end up with lots of half-full pages of the same key.  With a random
decision we will visit and fill both the left and right halves of a
previously split page.

Comments?  Is there a better way?  What's the best probability to use?
        regards, tom lane


Re: Performance on inserts

От
Alfred Perlstein
Дата:
* Tom Lane <tgl@sss.pgh.pa.us> [000825 08:25] wrote:
> 
> In other words, we do a linear scan over all the pages containing equal
> key values, in the hope of finding one where we can shoehorn in the new
> entry.  If the keys are roughly equal-sized, that hope is usually vain,
> and we end up splitting the rightmost page that contains any keys
> equal to the target key.  Subsequent inserts of the same key value fill
> first the left-half split page, then the right-half, then split the
> right-half page again and repeat --- but for each insert we first had
> to scan over all the preceding pages containing equal keys.  That means
> inserting N equal keys takes O(N^2) time, for large enough N.
> 
> After thinking about this for a while, I propose a probabilistic
> solution.  Suppose that we add to the above loop a random-number check
> that causes us to fall out of the loop with probability 1/3 or so;
> that is, if the current page is full of the target key, then we move
> right to check the next page with probability 2/3, but with probability
> 1/3 we give up and split the current page to insert the new key.
> 
> With this approach we'd hardly ever move more than, say, 10 pages before
> splitting, so the time to insert a new key is bounded no matter how many
> duplicates it has.
> 
> The reason for using a random decision is that if we always stopped
> after scanning a fixed number of pages, then the space freed up by the
> previous decision to split would always be just out of reach, and we'd
> end up with lots of half-full pages of the same key.  With a random
> decision we will visit and fill both the left and right halves of a
> previously split page.
> 
> Comments?  Is there a better way?  What's the best probability to use?


I'm unsure if it's possible, but somehow storing the last place one
'gave up' and decided to split the page could offer a useful next-start
for the next insert.  Sort of attempting the split work amongst multiple
requests.  For some reason it looks like your algorithm might cause
problems because it plain gives up after 10 pages?                       
hope this helps,
-Alfred


Re: Performance on inserts

От
Tom Lane
Дата:
Alfred Perlstein <bright@wintelcom.net> writes:
> I'm unsure if it's possible, but somehow storing the last place one
> 'gave up' and decided to split the page could offer a useful next-start
> for the next insert.

I think that that would create problems with concurrency --- the
Lehman-Yao btree algorithm is designed around the assumption that
writers only move right and never want to go back left to change
a prior page.  So once we've moved right we don't get to go back to
the start of the chain of duplicates.

> For some reason it looks like your algorithm might cause
> problems because it plain gives up after 10 pages?

"Give up" just means "stop looking for already-existing free space,
and make some the hard way".  The steady-state average space utilization
of this way would be somewhat worse than the existing code, probably,
but I don't see that as a big problem.
        regards, tom lane


Re: Performance on inserts

От
Alfred Perlstein
Дата:
* Tom Lane <tgl@sss.pgh.pa.us> [000825 11:30] wrote:
> Alfred Perlstein <bright@wintelcom.net> writes:
> > I'm unsure if it's possible, but somehow storing the last place one
> > 'gave up' and decided to split the page could offer a useful next-start
> > for the next insert.
> 
> I think that that would create problems with concurrency --- the
> Lehman-Yao btree algorithm is designed around the assumption that
> writers only move right and never want to go back left to change
> a prior page.  So once we've moved right we don't get to go back to
> the start of the chain of duplicates.

Yes, but inconsistancies about the starting page should be handled
because i'm sure the code takes care of multiple scans from the
start.  A slightly incorrect startscan point is better than starting
from the beginning every time.  It's a hack, but no more than a random
guess where in my situation (if it's do-able) will eventually get
past the first bunch of full pages.

> > For some reason it looks like your algorithm might cause
> > problems because it plain gives up after 10 pages?
> 
> "Give up" just means "stop looking for already-existing free space,
> and make some the hard way".  The steady-state average space utilization
> of this way would be somewhat worse than the existing code, probably,
> but I don't see that as a big problem.

Well there's a possibility of the end of the sequence containing
free space, allowing previous failures to be accounted for can make
a difference.

But it's just a suggestion from someone who really ought to be
studying the internals a bit more. :)

-- 
-Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org]
"I have the heart of a child; I keep it in a jar on my desk."


Re: Performance on inserts

От
Tom Lane
Дата:
Alfred Perlstein <bright@wintelcom.net> writes:
> A slightly incorrect startscan point is better than starting
> from the beginning every time.

Not for lookups, it isn't ;-).

I did think about making the btree-descending code act differently
for inserts than lookups, but AFAICS that wouldn't buy anything for
this problem, since until you get down to the leaf level you can't
see where there's free space anyway.  You'd simply be exchanging the
problem of missing free space that's too far to the right for the
problem of missing free space that's to the left of where you chose
to start looking.

The algorithm does seem to work quite nicely just the way I described
it, although it turns out I was off base about a good probability
setting.  I find that something up around 0.99 seems to be good.
Using the same (perhaps overly simplistic) test case:

# tuples inserted        6.5        current+random hack @ 0.99        Time    index size    Time    index size
1536            <1sec    90112        <1sec    106496
3072            1.56    163840        <1sec    188416
6144            3.70    286720        1.40    376832
12288            9.73    532480        2.65    688128
24576            93.26    1024000        5.22    1368064
49152            363.23    2007040        10.34    2727936
98304                        22.07    5545984
196608                        45.60    11141120
393216                        92.53    22290432

I tried probabilities from 0.67 to 0.999 and found that runtimes didn't
vary a whole lot (though this is near the minimum), while index size
consistently got larger as the probability of moving right decreased.
The runtime is nicely linear throughout the range.

The index size increase might look like a bit of a jump, but actually
this is right where we want it to be.  The old code was effectively
packing each page as full as it could be under these conditions.  That's
not what you want for a btree.  Steady-state load factor for btrees is
usually quoted as somewhere around 70%, and this method manages to
approximate that pretty well with a move-right probability of 0.99 or
a bit less.
        regards, tom lane


Re: Performance on inserts

От
Thomas Lockhart
Дата:
> Comments?  Is there a better way?  What's the best probability to use?

For this particular example, "partial indices" seems to be the best fit.
The index can be chosen to omit the most common value(s), since those
would indicate a sequential scan anyway.

Other DBs allow a parameter to set the "fill ratio" of index pages,
which might also help. But probably not as much as you might like when
one is doing a large number of inserts at a time.

Your "randomized" algorithm looks very promising. What is the status of
partial indices? Are they functional now, or have they been broken
forever (I'm not recalling)?
                - Thomas


Re: Performance on inserts

От
Tom Lane
Дата:
Thomas Lockhart <lockhart@alumni.caltech.edu> writes:
> What is the status of
> partial indices? Are they functional now, or have they been broken
> forever (I'm not recalling)?

They've been diked out of gram.y's syntax for CREATE INDEX at least
since Postgres95.  No way to tell who did that, why or when, AFAIK.
There is still an awful lot of support code for them, however.

I have no good way to guess how much bit-rot has occurred in all that
unexercised code ... but it'd be interesting to try to get it going
again.
        regards, tom lane


Re: Performance on inserts

От
Thomas Lockhart
Дата:
> I have no good way to guess how much bit-rot has occurred in all that
> unexercised code ... but it'd be interesting to try to get it going
> again.

Yes, it is a *great* feature, since it directly addresses the problems
associates with one of the most common non-random data distributions
(the index can be constructed to completely ignore those most common
values, and hence be smaller, less often updated, and holding only those
values which might actually be used in an index scan). If we don't get
to outer joins, this would be a good second choice for 7.1 ;)
                    - Thomas


Re: Performance on inserts

От
Hannu Krosing
Дата:
Tom Lane wrote:
> 
> Thomas Lockhart <lockhart@alumni.caltech.edu> writes:
> > What is the status of
> > partial indices? Are they functional now, or have they been broken
> > forever (I'm not recalling)?
> 
> They've been diked out of gram.y's syntax for CREATE INDEX at least
> since Postgres95.  No way to tell who did that, why or when, AFAIK.
> There is still an awful lot of support code for them, however.

I suspect that current indexes don't store nulls (and are thereby 
partial indexes in relation to nulls ;)

At least the following suggests it:
---8<------------8<------------8<------------8<---------
hannu=> explain select * from test1 where i=777;
NOTICE:  QUERY PLAN:

Index Scan using test1_i_ndx on test1  (cost=2.05 rows=2 width=8)

EXPLAIN
hannu=> explain select * from test1 where i is null;
NOTICE:  QUERY PLAN:

Seq Scan on test1  (cost=3144.36 rows=27307 width=8)
---8<------------8<------------8<------------8<---------

As the logic to include or not include something in index seems to be
there 
for nulls (and thus can't be very badly bit-rotten) it should be
possible to 
extend it for simpler =x or in(x,y,z) conditions.

> I have no good way to guess how much bit-rot has occurred in all that
> unexercised code ... but it'd be interesting to try to get it going
> again.

Of course the IS NULL case may just be hard-wired in the optimiser in
which 
case there may be not much use of current code. 

IIRC Postgres95 was the first postgres with SQL, so if already that did
not 
have partial indexes then no SQL-grammar for postgreSQL had. (Or maybe 
Illustra did but it was a separate effort anyway ;)

---------------
Hannu


Re: Performance on inserts

От
Jules Bean
Дата:
On Fri, Aug 25, 2000 at 07:00:22PM -0400, Tom Lane wrote:
> The algorithm does seem to work quite nicely just the way I described
> it, although it turns out I was off base about a good probability
> setting.  I find that something up around 0.99 seems to be good.
> Using the same (perhaps overly simplistic) test case:
> 
> # tuples inserted        6.5        current+random hack @ 0.99
>             Time    index size    Time    index size
> 1536            <1sec    90112        <1sec    106496
> 3072            1.56    163840        <1sec    188416
> 6144            3.70    286720        1.40    376832
> 12288            9.73    532480        2.65    688128
> 24576            93.26    1024000        5.22    1368064
> 49152            363.23    2007040        10.34    2727936
> 98304                        22.07    5545984
> 196608                        45.60    11141120
> 393216                        92.53    22290432
> 
> I tried probabilities from 0.67 to 0.999 and found that runtimes didn't
> vary a whole lot (though this is near the minimum), while index size
> consistently got larger as the probability of moving right decreased.
> The runtime is nicely linear throughout the range.

That looks brilliant!! (Bearing in mind that I have over 10 million
tuples in my table, you can imagine what performance was like for me!)
Is there any chance you could generate a patch against released 7.0.2
to add just this functionality... It would be the kiss of life for my
code!

(Not in a hurry, I'm not back in work until Wednesday, as it happens)

And, of course, what would /really/ get my code going speedily would
be the partial indices mentioned elsewhere in this thread.  If the
backend could automagically drop keys containing > 10% (tunable) of
the rows from the index, then my index would be (a) about 70% smaller!
and (b) only used when it's faster. [This means it would have to
update some simple histogram data.  However, I can't see that being
much of an overhead]

For the short term, if I can get a working version of the above
randomisation patch, I think I shall 'fake' a partial index by
manually setting 'enable_seqscan=off' for all but the 4 or 5 most
common categories. Those two factors combined will speed up my bulk
inserts a lot.

One other idea, though:

Is there any simple way for Pg to combine inserts into one bulk?
Specifically, their effect on the index files.  It has always seemed
to me to be one of the (many) glaring flaws in SQL that the INSERT
statement only takes one row at a time.  But, using INSERT ... SELECT,
I can imagine that it might be possible to do 'bulk' index
updating. so that scanning process is done once per 'batch'.

If I can make an analogy with sorted files (which indices are rather
like), if I wanted to add another 100 lines to a 1000 line sorted
file, I'd sort the 100 first, and then 'merge' them in.  Whilst I
realise that indices aren't stored sorted (no need), I think it ought
to be possible to construct an efficient algorithm for merging two
btrees?

Jules

-- 
Jules Bean                          |        Any sufficiently advanced 
jules@debian.org                    |  technology is indistinguishable
jules@jellybean.co.uk               |               from a perl script


Re: Performance on inserts

От
Matthew Kirkwood
Дата:
On Sat, 26 Aug 2000, Jules Bean wrote:

> Is there any simple way for Pg to combine inserts into one bulk?
> Specifically, their effect on the index files.  It has always seemed
> to me to be one of the (many) glaring flaws in SQL that the INSERT
> statement only takes one row at a time.

One of MySQL's little syntax abuses allows:

INSERT INTO tab (col1, ..) VALUES (val1, ..), (val2, ..);

which is nice for avoiding database round trips.  It's one
of the reasons that mysql can do a bulk import so quickly.

> But, using INSERT ... SELECT, I can imagine that it might be possible
> to do 'bulk' index updating. so that scanning process is done once per
> 'batch'.

Logic for these two cases would be excellent.

Matthew.



Re: Performance on inserts

От
Alfred Perlstein
Дата:
* Matthew Kirkwood <matthew@hairy.beasts.org> [000826 04:22] wrote:
> On Sat, 26 Aug 2000, Jules Bean wrote:
> 
> > Is there any simple way for Pg to combine inserts into one bulk?
> > Specifically, their effect on the index files.  It has always seemed
> > to me to be one of the (many) glaring flaws in SQL that the INSERT
> > statement only takes one row at a time.
> 
> One of MySQL's little syntax abuses allows:
> 
> INSERT INTO tab (col1, ..) VALUES (val1, ..), (val2, ..);
> 
> which is nice for avoiding database round trips.  It's one
> of the reasons that mysql can do a bulk import so quickly.

That would be an _extremely_ useful feature if it made a difference
in postgresql's insert speed.

> 
> > But, using INSERT ... SELECT, I can imagine that it might be possible
> > to do 'bulk' index updating. so that scanning process is done once per
> > 'batch'.
> 
> Logic for these two cases would be excellent.

We do this sometimes, works pretty nicely.

-- 
-Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org]
"I have the heart of a child; I keep it in a jar on my desk."


Re: Performance on inserts

От
Tom Lane
Дата:
Matthew Kirkwood <matthew@hairy.beasts.org> writes:
> One of MySQL's little syntax abuses allows:
> INSERT INTO tab (col1, ..) VALUES (val1, ..), (val2, ..);

Actually, that's perfectly standard SQL92, just an item we haven't
got round to supporting yet.  (Until we do the fabled querytree
restructuring, it seems a lot harder than it is worth.)

COPY FROM stdin is definitely the fastest way of inserting data,
however, since you avoid a ton of parse/plan overhead that way.
Of course you also lose the ability to have column defaults
computed for you, etc ... there's no free lunch ...
        regards, tom lane


Re: Performance on inserts

От
Tom Lane
Дата:
Jules Bean <jules@jellybean.co.uk> writes:
> Is there any chance you could generate a patch against released 7.0.2
> to add just this functionality... It would be the kiss of life for my
> code!

Will look at it.  Are you brave enough to want to try the rest of the
7.1 rewrite of the btree code, or do you just want this one hack?

> And, of course, what would /really/ get my code going speedily would
> be the partial indices mentioned elsewhere in this thread.  If the
> backend could automagically drop keys containing > 10% (tunable) of
> the rows from the index, then my index would be (a) about 70% smaller!

I don't think anyone was envisioning "automagic" drop of most common
values.  The partial-index support that's partially there ;-) is
designed around manual specification of a predicate, ie, you'd say
CREATE INDEX mypartialindex ON table (column)    WHERE column != 42 AND column != 1066

if you wanted a partial index omitting values 42 and 1066.  The backend
would then consider using the index to process queries wherein it can
prove that the query's WHERE implies the index predicate.  For example
SELECT * FROM table WHERE column = 11

would be able to use this index but
SELECT * FROM table WHERE column < 100

would not.

You could certainly write a little periodic-maintenance script to
determine the most common values in your tables and recreate your
partial indexes accordingly ... but I doubt it'd make sense to try
to get the system to do that automatically on-the-fly.

> For the short term, if I can get a working version of the above
> randomisation patch, I think I shall 'fake' a partial index by
> manually setting 'enable_seqscan=off' for all but the 4 or 5 most
> common categories. Those two factors combined will speed up my bulk
> inserts a lot.

Uh, enable_seqscan has nothing to do with how inserts are handled...

> Is there any simple way for Pg to combine inserts into one bulk?

COPY.

> Specifically, their effect on the index files.

This particular problem couldn't be cured by batching inserts anyway.
The performance problem was coming from the actual act of inserting
a key (or more specifically, making room for the key) and that's just
got to be done for each key AFAICS.
        regards, tom lane


Re: Performance on inserts

От
Jules Bean
Дата:
On Sat, Aug 26, 2000 at 12:32:20PM -0400, Tom Lane wrote:
> Jules Bean <jules@jellybean.co.uk> writes:
> > Is there any chance you could generate a patch against released 7.0.2
> > to add just this functionality... It would be the kiss of life for my
> > code!
> 
> Will look at it.  Are you brave enough to want to try the rest of the
> 7.1 rewrite of the btree code, or do you just want this one hack?

Hmm.  I don't know :-) because I don't really know the scope of the
changes... I do need this database to be stable, but OTOH, changes to
the index code are unlikely to corrupt my data.

> 
> I don't think anyone was envisioning "automagic" drop of most common
> values.  The partial-index support that's partially there ;-) is
> designed around manual specification of a predicate, ie, you'd say
> 
>     CREATE INDEX mypartialindex ON table (column)
>         WHERE column != 42 AND column != 1066

Fair enough.

> 
> > For the short term, if I can get a working version of the above
> > randomisation patch, I think I shall 'fake' a partial index by
> > manually setting 'enable_seqscan=off' for all but the 4 or 5 most
> > common categories. Those two factors combined will speed up my bulk
> > inserts a lot.
> 
> Uh, enable_seqscan has nothing to do with how inserts are handled...

Um, that was a thinko!  What I was trying to say is: the randomisation
will speed up my bulk inserts, whereas enable_seqscan=off will speed
up the other slow queries in my application, namely "SELECT * where
category='foo'" type queries.

> > Specifically, their effect on the index files.
> 
> This particular problem couldn't be cured by batching inserts anyway.
> The performance problem was coming from the actual act of inserting
> a key (or more specifically, making room for the key) and that's just
> got to be done for each key AFAICS.

In principle, it could be helpful to know you're inserting 10000
tuples with category='xyz'.  Then you only make the left-to-right
scan once, and you fill up every hole you see, before finally
splitting the last page and inserting approx (10000)/(num_per_page)
new pages all at once.  This is surely quicker that doing the
left-to-right scan for each tuple (even though the difference would be
far less noticeable in the presence of your probablistic patch).

Jules

-- 
Jules Bean                          |        Any sufficiently advanced 
jules@debian.org                    |  technology is indistinguishable
jules@jellybean.co.uk               |               from a perl script


Re: Performance on inserts

От
Tom Lane
Дата:
Tom Lane <tgl@sss.pgh.pa.us> writes:
> Jules Bean <jules@jellybean.co.uk> writes:
>> Is there any chance you could generate a patch against released 7.0.2
>> to add just this functionality... It would be the kiss of life for my
>> code!

> Will look at it.

I looked at it and decided that I don't want to mess with it.  The
BTP_CHAIN logic in 7.0 is so weird and fragile that it's hard to tell
what will happen if we try to split anything but the last page of a
chain of duplicates.  The idea I'd had of just dropping in the whole
btree module from current sources doesn't look very practical either;
a lot of changes that span module boundaries would have to be backed out
of it.

My recommendation is to try out current sources (from a nightly snapshot
or CVS).  I realize that running a non-released version might make you
uncomfortable, and rightly so; but I believe that current sources are in
pretty good shape right now.  In any case, working out any bugs lurking
in current strikes me as a much more profitable activity than trying to
back-patch the old btree code.
        regards, tom lane


Re: Performance on inserts

От
Chris
Дата:
Matthew Kirkwood wrote:
> 
> On Sat, 26 Aug 2000, Jules Bean wrote:
> 
> > Is there any simple way for Pg to combine inserts into one bulk?
> > Specifically, their effect on the index files.  It has always seemed
> > to me to be one of the (many) glaring flaws in SQL that the INSERT
> > statement only takes one row at a time.
> 
> One of MySQL's little syntax abuses allows:
> 
> INSERT INTO tab (col1, ..) VALUES (val1, ..), (val2, ..);

Wouldn't....

INSERT INTO tab (col1, ..); INSERT INTO tab VALUES (val1, ..); INSERT
INTO tab (val2, ..);

also avoid a database round trip??


Re: Performance on inserts

От
Oliver Teuber
Дата:
On Sat, Aug 26, 2000 at 12:14:06PM +0100, Matthew Kirkwood wrote:
> On Sat, 26 Aug 2000, Jules Bean wrote:
> 
> > Is there any simple way for Pg to combine inserts into one bulk?
> > Specifically, their effect on the index files.  It has always seemed
> > to me to be one of the (many) glaring flaws in SQL that the INSERT
> > statement only takes one row at a time.
> 
> One of MySQL's little syntax abuses allows:
> 
> INSERT INTO tab (col1, ..) VALUES (val1, ..), (val2, ..);
> 
> which is nice for avoiding database round trips.  It's one
> of the reasons that mysql can do a bulk import so quickly.
>
copy seems to be very fast ... i dont know why all people use
multiple inserts ??? just use copy ...

yours, oliver teuber


SQL COPY syntax extension (was: Performance on inserts)

От
Hannu Krosing
Дата:
Oliver Teuber wrote:
> 
> On Sat, Aug 26, 2000 at 12:14:06PM +0100, Matthew Kirkwood wrote:
> > On Sat, 26 Aug 2000, Jules Bean wrote:
> >
> > > Is there any simple way for Pg to combine inserts into one bulk?
> > > Specifically, their effect on the index files.  It has always seemed
> > > to me to be one of the (many) glaring flaws in SQL that the INSERT
> > > statement only takes one row at a time.
> >
> > One of MySQL's little syntax abuses allows:
> >
> > INSERT INTO tab (col1, ..) VALUES (val1, ..), (val2, ..);
> >
> > which is nice for avoiding database round trips.  It's one
> > of the reasons that mysql can do a bulk import so quickly.
> >
> copy seems to be very fast ... i dont know why all people use
> multiple inserts ??? just use copy ...
>

Could copy be extended to support a more SQL-friendly syntax. like

COPY tablename FROM VALUES((x1,y1,z1),(x2,y2,z2),(x3,y3,z3)
);

The main reason I have not used COPY very much is it's non-SQL syntax 
for field values.

Extending the COPY command would probably be much easier than speeding up
INSERTS.

-------------
Hannu


Re: SQL COPY syntax extension (was: Performance on inserts)

От
Thomas Lockhart
Дата:
> Could copy be extended to support a more SQL-friendly syntax. like
> COPY tablename FROM VALUES(
>  (x1,y1,z1),
>  (x2,y2,z2),
>  (x3,y3,z3)
> );
> Extending the COPY command would probably be much easier than speeding up
> INSERTS.

That syntax is a lot like a real SQL9x INSERT. Supporting multiple rows
for inserts is probably not that difficult; but since INSERT is used so
much we would have to make sure we don't screw up something else. At the
moment, expanding a line of SQL into multiple internal querytree nodes
is a bit crufty but is possible.
                 - Thomas


Re: SQL COPY syntax extension (was: Performance on inserts)

От
Tom Lane
Дата:
Thomas Lockhart <lockhart@alumni.caltech.edu> writes:
> That syntax is a lot like a real SQL9x INSERT.

Multiple row constructors in INSERT is one of my to-do items for the
planned querytree redesign.  I have not thought it was worth messing
with until we're ready to bite that bullet, however.
        regards, tom lane


Re: SQL COPY syntax extension (was: Performance on inserts)

От
Hannu Krosing
Дата:
Thomas Lockhart wrote:
> 
> > Could copy be extended to support a more SQL-friendly syntax. like
> > COPY tablename FROM VALUES(
> >  (x1,y1,z1),
> >  (x2,y2,z2),
> >  (x3,y3,z3)
> > );
> > Extending the COPY command would probably be much easier than speeding up
> > INSERTS.
> 
> That syntax is a lot like a real SQL9x INSERT. Supporting multiple rows
> for inserts is probably not that difficult; but since INSERT is used so
> much we would have to make sure we don't screw up something else. At the
> moment, expanding a line of SQL into multiple internal querytree nodes
> is a bit crufty but is possible.

What I actually had in mind was a more SQL-like syntax for copy, i.e. no 
default arguments, all fields required etc. that would we easy to bolt
on 
current copy machinery but still use 'SQL' syntax (no . or \. or \\. for
EOD, 
NULL for NULL values, quotes around strings ...)

------------
Hannu


Re: SQL COPY syntax extension (was: Performance on inserts)

От
Hannu Krosing
Дата:
Tom Lane wrote:
> 
> Thomas Lockhart <lockhart@alumni.caltech.edu> writes:
> > That syntax is a lot like a real SQL9x INSERT.
> 
> Multiple row constructors in INSERT is one of my to-do items for the
> planned querytree redesign.  I have not thought it was worth messing
> with until we're ready to bite that bullet, however.

What is the status of this querytree redesign ?

I'v spent some time trying to get a grip ofthe _exact_ meaning of the
WITH RECURSIVE syntax in SQL3/99 as I badly need it in a project of
mine.

(I know the _general_ meaning - it is for querying tree-structured data
;)

The things the new querytree should address sould be (at least ;) - 

1. OUTER JOINS
2. WITH RECURSIVE
3. Support for positioned UPDATE & DELETE (requires probably lot more
than just querytree redesign)
4. Easyer special-casing of optimisations (like using an index on x for  'select max(x) from t', but not for 'select
max(x)from t where n=7'
 

Is the special mailing-list for querytree redesing active ?

--------------
Hannu


Re: SQL COPY syntax extension (was: Performance on inserts)

От
Tom Lane
Дата:
Hannu Krosing <hannu@tm.ee> writes:
> What is the status of this querytree redesign ?

Waiting for 7.2 cycle, as far as I know.

> The things the new querytree should address sould be (at least ;) - 
> 2. WITH RECURSIVE

I don't think RECURSIVE is a querytree issue --- it looks like a much
bigger problem than that :-(

The things I'm concerned about fixing with querytree redesign are* full SQL92 joins* subselects in FROM* view bugs
(groupingand aggregates in views)* INSERT ... SELECT bugs* reimplement UNION/INTERSECT/EXCEPT in a less hacky way,
makecases like SELECT ... UNION ... ORDER BY work.  Not to mention UNION etc in a subselect or in INSERT/SELECT.*
convertWHERE x IN (subselect) to a join-like representation
 

These are all things that have gone unfixed for years because they're
essentially unfixable with the current single-level representation of
a query.
        regards, tom lane


Re: SQL COPY syntax extension (was: Performance on inserts)

От
Tom Lane
Дата:
Hannu Krosing <hannu@tm.ee> writes:
> What I actually had in mind was a more SQL-like syntax for copy,
> i.e. no default arguments, all fields required etc. that would we easy
> to bolt on current copy machinery but still use 'SQL' syntax (no . or
> \. or \\. for EOD, NULL for NULL values, quotes around strings ...)

Seems like a reasonable idea, although I'd recommend sticking to the
convention that \. on a line means EOD, to avoid having to hack the
client-side libraries.  As long as you leave that alone, libpq,
libpgtcl, etc etc should be transparent to the copy data format.
        regards, tom lane


Re: SQL COPY syntax extension (was: Performance on inserts)

От
Thomas Lockhart
Дата:
> The things I'm concerned about fixing with querytree redesign are
...

And it will be a building block for
 *distributed databases *replication *CORBA/alternate protocols *other cool stuff ;)
               - Thomas


Re: Performance on inserts

От
Jules Bean
Дата:
On Sat, Aug 26, 2000 at 05:09:53PM -0400, Tom Lane wrote:
> Tom Lane <tgl@sss.pgh.pa.us> writes:
> > Jules Bean <jules@jellybean.co.uk> writes:
> >> Is there any chance you could generate a patch against released 7.0.2
> >> to add just this functionality... It would be the kiss of life for my
> >> code!
> 
> > Will look at it.
> 
> I looked at it and decided that I don't want to mess with it.  The
> BTP_CHAIN logic in 7.0 is so weird and fragile that it's hard to tell
> what will happen if we try to split anything but the last page of a
> chain of duplicates.  The idea I'd had of just dropping in the whole
> btree module from current sources doesn't look very practical either;
> a lot of changes that span module boundaries would have to be backed out
> of it.
> 
> My recommendation is to try out current sources (from a nightly snapshot
> or CVS).  I realize that running a non-released version might make you
> uncomfortable, and rightly so; but I believe that current sources are in
> pretty good shape right now.  In any case, working out any bugs lurking
> in current strikes me as a much more profitable activity than trying to
> back-patch the old btree code.

OK.  Thanks very much for going through this with me.

I'm actually going to simply do without the index for this release of
my software -- it's an internal project, and the speed problems of
having no index aren't disastrous. However, I'd rather not fight any
instabilities, this project runs long-term and mostly unattended.

I'll certainly keenly upgrade to 7.1 when it comes out, though, with
the new btree logic.

Jules


Re: Performance on inserts

От
Bruce Momjian
Дата:
> Thomas Lockhart <lockhart@alumni.caltech.edu> writes:
> > What is the status of
> > partial indices? Are they functional now, or have they been broken
> > forever (I'm not recalling)?
> 
> They've been diked out of gram.y's syntax for CREATE INDEX at least
> since Postgres95.  No way to tell who did that, why or when, AFAIK.
> There is still an awful lot of support code for them, however.

It may have been me.  Not sure.  At the time, no one was sure what they
did, or if we even wanted them.  They were broken, I think.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: Performance on inserts

От
Bruce Momjian
Дата:
> > 98304                        22.07    5545984
> > 196608                        45.60    11141120
> > 393216                        92.53    22290432
> > 
> > I tried probabilities from 0.67 to 0.999 and found that runtimes didn't
> > vary a whole lot (though this is near the minimum), while index size
> > consistently got larger as the probability of moving right decreased.
> > The runtime is nicely linear throughout the range.
> 
> That looks brilliant!! (Bearing in mind that I have over 10 million
> tuples in my table, you can imagine what performance was like for me!)
> Is there any chance you could generate a patch against released 7.0.2
> to add just this functionality... It would be the kiss of life for my
> code!
> 
> (Not in a hurry, I'm not back in work until Wednesday, as it happens)
> 
> And, of course, what would /really/ get my code going speedily would
> be the partial indices mentioned elsewhere in this thread.  If the
> backend could automagically drop keys containing > 10% (tunable) of
> the rows from the index, then my index would be (a) about 70% smaller!
> and (b) only used when it's faster. [This means it would have to
> update some simple histogram data.  However, I can't see that being
> much of an overhead]
> 
> For the short term, if I can get a working version of the above
> randomisation patch, I think I shall 'fake' a partial index by
> manually setting 'enable_seqscan=off' for all but the 4 or 5 most
> common categories. Those two factors combined will speed up my bulk
> inserts a lot.

What would be really nifty is to take the most common value found by
VACUUM ANALYZE, and cause sequential scans if that value represents more
than 50% of the entries in the table.

Added to TODO:

* Prevent index lookups (or index entries using partial index) on most common values; instead use sequential scan 

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: Performance on inserts

От
Bruce Momjian
Дата:
> Matthew Kirkwood <matthew@hairy.beasts.org> writes:
> > One of MySQL's little syntax abuses allows:
> > INSERT INTO tab (col1, ..) VALUES (val1, ..), (val2, ..);
> 
> Actually, that's perfectly standard SQL92, just an item we haven't
> got round to supporting yet.  (Until we do the fabled querytree
> restructuring, it seems a lot harder than it is worth.)

Added to TODO:
INSERT INTO tab (col1, ..) VALUES (val1, ..), (val2, ..);

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: Performance on inserts

От
Tom Lane
Дата:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
>> Thomas Lockhart <lockhart@alumni.caltech.edu> writes:
>>>> What is the status of partial indices?
>>
>> They've been diked out of gram.y's syntax for CREATE INDEX at least
>> since Postgres95.  No way to tell who did that, why or when, AFAIK.
>> There is still an awful lot of support code for them, however.

> It may have been me.  Not sure.  At the time, no one was sure what they
> did, or if we even wanted them.  They were broken, I think.

Wasn't you unless you were hacking the code before it entered our CVS
system.  I checked the oldest CVS version (Postgres95 release) and saw
that partial indexes were disabled in gram.y in that version.
        regards, tom lane


Re: Performance on inserts

От
Tom Lane
Дата:
> * Prevent index lookups (or index entries using partial index) on most
>   common values; instead use sequential scan 

This behavior already exists for the most common value, and would
exist for any additional values that we had stats for.  Don't see
why you think a separate TODO item is needed.
        regards, tom lane


Re: Performance on inserts

От
Bruce Momjian
Дата:
> > * Prevent index lookups (or index entries using partial index) on most
> >   common values; instead use sequential scan 
> 
> This behavior already exists for the most common value, and would
> exist for any additional values that we had stats for.  Don't see
> why you think a separate TODO item is needed.

You mean the optimizer already skips an index lookup for the most common
value, and instead does a sequential scan?  Seems you are way ahead of
me.


--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: Performance on inserts

От
Tom Lane
Дата:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
>>>> * Prevent index lookups (or index entries using partial index) on most
>>>> common values; instead use sequential scan 
>> 
>> This behavior already exists for the most common value, and would
>> exist for any additional values that we had stats for.  Don't see
>> why you think a separate TODO item is needed.

> You mean the optimizer already skips an index lookup for the most common
> value, and instead does a sequential scan?

No, it goes for the sequential scan if it estimates the cost of the
indexscan as more than sequential.  Indexscan cost depends on estimated
number of retrieved rows --- which it can estimate from pg_statistic
if the query is WHERE column = mostcommonvalue.  So which plan you get
depends on just how common the most common value is.

Hard-wiring either choice of plan for the most common value would be
inferior to what the code already does, AFAICS.  But for values other
than the-most-common, we don't have adequate stats in pg_statistic,
and so you may or may not get a good estimated row count and hence
a good choice of plan.  That's what needs to be fixed.
        regards, tom lane


Re: Performance on inserts

От
Bruce Momjian
Дата:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> >>>> * Prevent index lookups (or index entries using partial index) on most
> >>>> common values; instead use sequential scan 
> >> 
> >> This behavior already exists for the most common value, and would
> >> exist for any additional values that we had stats for.  Don't see
> >> why you think a separate TODO item is needed.
> 
> > You mean the optimizer already skips an index lookup for the most common
> > value, and instead does a sequential scan?
> 
> No, it goes for the sequential scan if it estimates the cost of the
> indexscan as more than sequential.  Indexscan cost depends on estimated
> number of retrieved rows --- which it can estimate from pg_statistic
> if the query is WHERE column = mostcommonvalue.  So which plan you get
> depends on just how common the most common value is.
> 
> Hard-wiring either choice of plan for the most common value would be
> inferior to what the code already does, AFAICS.  But for values other
> than the-most-common, we don't have adequate stats in pg_statistic,
> and so you may or may not get a good estimated row count and hence
> a good choice of plan.  That's what needs to be fixed.

OK, I remember now.  If the most common value is used as a constant, it
uses the value from pg_statistic for most common, rather than use
the dispersion value.  That is great.

What I am more concerned about is a join that uses the most common
value.  We do an index scan in that case.  I wonder of we could get
something into the executor that would switch to sequential scan when
the most common value is hit.  Is that worth persuing?




--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: Performance on inserts

От
Tom Lane
Дата:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> What I am more concerned about is a join that uses the most common
> value.  We do an index scan in that case.

No, we do whichever plan looks cheapest.  Again, it's all about
statistics.

Right now, eqjoinsel() is just a stub that returns a constant
selectivity estimate.  It might be useful to compute some more
sophisticated value based on pg_statistic entries for the two
columns, but right now I doubt you could tell much.  Should keep
the join case in mind when we extend the statistics...
        regards, tom lane


Re: Performance on inserts

От
Bruce Momjian
Дата:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > What I am more concerned about is a join that uses the most common
> > value.  We do an index scan in that case.
> 
> No, we do whichever plan looks cheapest.  Again, it's all about
> statistics.
> 
> Right now, eqjoinsel() is just a stub that returns a constant
> selectivity estimate.  It might be useful to compute some more
> sophisticated value based on pg_statistic entries for the two
> columns, but right now I doubt you could tell much.  Should keep
> the join case in mind when we extend the statistics...

OK, let me be more specific. Suppose the most common value in a column
is 3.  For a query "col = 3", we know 3 is most common, and use the most
common statistics rather than the dispersion statistic, right?

OK, let's assume use of the most common statistic causes a sequential
scan, but use of dispersion causes an index scan.

The query "col = 3" uses sequential scan. In the query "col = tab2.col2",
the dispersion statistic is used, causing an index scan.  

However, assume tab2.col2 equals 3.  I assume this would cause an index
scan because the executor doesn't know about the most common value,
right? Is it worth trying to improve that?

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: Performance on inserts

От
Tom Lane
Дата:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> However, assume tab2.col2 equals 3.  I assume this would cause an index
> scan because the executor doesn't know about the most common value,
> right? Is it worth trying to improve that?

Oh, I see: you are assuming that a nestloop join is being done, and
wondering if it's worthwhile to switch dynamically between seqscan
and indexscan for each scan of the inner relation, depending on exactly
what value is being supplied from the outer relation for that scan.
Hmm.

Not sure if it's worth the trouble or not.  Nestloop is usually a
last-resort join strategy anyway, and is unlikely to be picked when the
tables are large enough to make performance be a big issue.
        regards, tom lane


Re: Performance on inserts

От
Bruce Momjian
Дата:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > However, assume tab2.col2 equals 3.  I assume this would cause an index
> > scan because the executor doesn't know about the most common value,
> > right? Is it worth trying to improve that?
> 
> Oh, I see: you are assuming that a nestloop join is being done, and
> wondering if it's worthwhile to switch dynamically between seqscan
> and indexscan for each scan of the inner relation, depending on exactly
> what value is being supplied from the outer relation for that scan.
> Hmm.
> 
> Not sure if it's worth the trouble or not.  Nestloop is usually a
> last-resort join strategy anyway, and is unlikely to be picked when the
> tables are large enough to make performance be a big issue.

Yes, I realize only nested loop has this problem.  Mergejoin and
Hashjoin actually would grab the whole table via sequential scan, so the
index is not involved, right?

Let me ask, if I do the query, "tab1.col = tab2.col and tab2.col = 3",
the system would use an index to get tab2.col, but then what method
would it use to join to tab1?  Nested loop because it thinks it is going
to get only one row from tab1.col1.  If 3 is the most common value in
tab1.col, it is going to get lots more than one row, right?

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: Performance on inserts

От
Tom Lane
Дата:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> Yes, I realize only nested loop has this problem.  Mergejoin and
> Hashjoin actually would grab the whole table via sequential scan, so the
> index is not involved, right?

They'd grab the whole table after applying restriction clauses.  An
indexscan might be used if there's an appropriate restriction clause
for either table, or to sort a table for merge join...

> Let me ask, if I do the query, "tab1.col = tab2.col and tab2.col = 3",
> the system would use an index to get tab2.col, but then what method
> would it use to join to tab1?  Nested loop because it thinks it is going
> to get only one row from tab1.col1.

I don't think it'd think that.  The optimizer is not presently smart
enough to make the transitive deduction that tab1.col = 3 (it does
recognize transitive equality of Vars, but doesn't extend that to
non-variable values).  So it won't see any restriction clause for
tab1 here.

If it thinks that tab2.col = 3 will yield one row, it might well choose
a nested loop with tab2 as the outer, rather than merge or hash join.
So an inner indexscan for tab1 is definitely a possible plan.
        regards, tom lane


Re: Performance on inserts

От
Bruce Momjian
Дата:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > Yes, I realize only nested loop has this problem.  Mergejoin and
> > Hashjoin actually would grab the whole table via sequential scan, so the
> > index is not involved, right?
> 
> They'd grab the whole table after applying restriction clauses.  An
> indexscan might be used if there's an appropriate restriction clause
> for either table, or to sort a table for merge join...
> 
> > Let me ask, if I do the query, "tab1.col = tab2.col and tab2.col = 3",
> > the system would use an index to get tab2.col, but then what method
> > would it use to join to tab1?  Nested loop because it thinks it is going
> > to get only one row from tab1.col1.
> 
> I don't think it'd think that.  The optimizer is not presently smart
> enough to make the transitive deduction that tab1.col = 3 (it does
> recognize transitive equality of Vars, but doesn't extend that to
> non-variable values).  So it won't see any restriction clause for
> tab1 here.
> 
> If it thinks that tab2.col = 3 will yield one row, it might well choose
> a nested loop with tab2 as the outer, rather than merge or hash join.
> So an inner indexscan for tab1 is definitely a possible plan.

Yes, that was my point, that a nested loop could easily be involved if
the joined table has a restriction.  Is there a TODO item here?

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: Performance on inserts

От
Tom Lane
Дата:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
>> So an inner indexscan for tab1 is definitely a possible plan.

> Yes, that was my point, that a nested loop could easily be involved if
> the joined table has a restriction.  Is there a TODO item here?

More like a "to investigate" --- I'm not sold on the idea that a
dynamic switch in plan types would be a win.  Maybe it would be,
but...

One thing to think about is that it'd be critically dependent on having
accurate statistics.  Currently, the planner only places bets on the
average behavior over a whole join.  If you make a separate bet on each
scan, then you open up the risk of betting wrong every time, should
your stats be out-of-date or otherwise misleading.
        regards, tom lane


Re: Performance on inserts

От
Bruce Momjian
Дата:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> >> So an inner indexscan for tab1 is definitely a possible plan.
> 
> > Yes, that was my point, that a nested loop could easily be involved if
> > the joined table has a restriction.  Is there a TODO item here?
> 
> More like a "to investigate" --- I'm not sold on the idea that a
> dynamic switch in plan types would be a win.  Maybe it would be,
> but...
> 
> One thing to think about is that it'd be critically dependent on having
> accurate statistics.  Currently, the planner only places bets on the
> average behavior over a whole join.  If you make a separate bet on each
> scan, then you open up the risk of betting wrong every time, should
> your stats be out-of-date or otherwise misleading.

I agree.  Not sure how to approach this, but I am sure it is dealt with
by most database systems.  Can someone find out how other db's handle
this?  Is there any research on it?

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026