Обсуждение: Slow tab completion w/ lots of tables

Поиск
Список
Период
Сортировка

Slow tab completion w/ lots of tables

От
Stephen Frost
Дата:
Greetings,
 When doing tab-completion under 9.1, pg_table_is_visible(oid) is slow and is ending up as the first thing tested
againstall the rows in pg_class.  Increasing the cost of pg_table_is_visible() up to 10 causes it to move to the end of
thetests, which improves things greatly- I thought there was a plan to make that the default..?
 
 This is with 9.1.4.
 After researching this a bit, I'm left wondering why we're using substring() to do the matching test.  I don't see any
easyway to index a substring() call which can be of any size (depending on how many characters are preceding the user
hitting'tab').  On the other hand, using LIKE with 'string%' and indexing with varchar_pattern_ops should give us a
niceindex which could be used for tab completion, right?  If no one points out an obvious flaw in that, I'll take a
lookat making that happen.
 
     Thanks!
    Stephen

Re: Slow tab completion w/ lots of tables

От
Robert Haas
Дата:
On Thu, Aug 16, 2012 at 10:37 PM, Stephen Frost <sfrost@snowman.net> wrote:
> Greetings,
>
>   When doing tab-completion under 9.1, pg_table_is_visible(oid) is slow
>   and is ending up as the first thing tested against all the rows
>   in pg_class.  Increasing the cost of pg_table_is_visible() up to
>   10 causes it to move to the end of the tests, which improves things
>   greatly- I thought there was a plan to make that the default..?
>
>   This is with 9.1.4.

Is this a regression versus earlier releases, or just a bad thing in general?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Slow tab completion w/ lots of tables

От
Stephen Frost
Дата:
Robert,

* Robert Haas (robertmhaas@gmail.com) wrote:
> On Thu, Aug 16, 2012 at 10:37 PM, Stephen Frost <sfrost@snowman.net> wrote:
> >   When doing tab-completion under 9.1, pg_table_is_visible(oid) is slow
> >   and is ending up as the first thing tested against all the rows
> >   in pg_class.  Increasing the cost of pg_table_is_visible() up to
> >   10 causes it to move to the end of the tests, which improves things
> >   greatly- I thought there was a plan to make that the default..?
> >
> >   This is with 9.1.4.
>
> Is this a regression versus earlier releases, or just a bad thing in general?

It's really a regression- in prior releases, we had the
pg_table_is_visible() test later in the WHERE clause, so that call
wasn't invoked as often.  This all happened when Tom reworked the psql
SQL calls to be defined in an array instead of in-line'd (which was a
good change, but moved pg_table_is_visible() up to the front of the
WHERE clause, slowing things down).
Thanks,
    Stephen

Re: Slow tab completion w/ lots of tables

От
Stephen Frost
Дата:
* Stephen Frost (sfrost@snowman.net) wrote:
> > Is this a regression versus earlier releases, or just a bad thing in general?
>
> It's really a regression- in prior releases

Sorry, to clarify (after reading through my -hackers inbox a bit more
and realizing you were probably asking about 9.2)- it's a regression,
but it was also in earlier releases..  I'd have to go back to the git
blame that I ran earlier to find it, but I think the change was made
in 8.4 or 9.0, so I don't think it's a regression as far as
9.0 -> 9.1 or 9.1 -> 9.2 is concerned.

I'm slightly ashamed to admit that we discovered it during our migration
from 8.2 -> 9.1...
Thanks,
    Stephen

Re: Slow tab completion w/ lots of tables

От
Stephen Frost
Дата:
* Robert Haas (robertmhaas@gmail.com) wrote:
> On Thu, Aug 16, 2012 at 10:37 PM, Stephen Frost <sfrost@snowman.net> wrote:
> >   When doing tab-completion under 9.1, pg_table_is_visible(oid) is slow
> >   and is ending up as the first thing tested against all the rows
> >   in pg_class.  Increasing the cost of pg_table_is_visible() up to
> >   10 causes it to move to the end of the tests, which improves things
> >   greatly- I thought there was a plan to make that the default..?
> >
> >   This is with 9.1.4.
>
> Is this a regression versus earlier releases, or just a bad thing in general?

Alright, so, yea, the commit I was referring to is this one:

e84487f67a0d216f3db87b2558f1edd322a09e48

Which was apparently in the 8.3 dev cycle, so this would be an
8.2 -> 8.3 regression (as I mentioned in my last email, on this
particular database, we're going from 8.2 -> 9.1).

My vote is that everyone else needs to have databases with more tables,
or they need to care about tab-completion speed more. :)  There are a
few complaints in the archives though, so I'm not alone..

Would changing the cost of pg_table_is_visible() require a catversion
bump..?  Or maybe just do it w/o the bump for 9.2 and tell beta testers
that they might want to make the change by hand?  Or bump it as part of
early 9.3?

I do also want to change tab-complete around a bit to make it so that we
can actually index the query based on the name, which would clearly be a
9.3 thing.  I was expecting some push-back on this idea, but havn't
heard any yet.  Would people accept adding an index on pg_class.relname
to support fast tab-completion?  Or is this going to expand into
figuring out how to support index-based partial lookups for the 'name'
type, so we could use the existing index (if that's even possible to
do...)?
Thanks,
    Stephen

Re: Slow tab completion w/ lots of tables

От
"Kevin Grittner"
Дата:
Stephen Frost <sfrost@snowman.net> wrote:
> Would people accept adding an index on pg_class.relname to support
> fast tab-completion? Or is this going to expand into figuring out
> how to support index-based partial lookups for the 'name' type, so
> we could use the existing index (if that's even possible 
> to do...)?
That already seems to work for me.
explain analyze
select relname from pg_class where relname like 'Party%';Index Scan using pg_class_relname_nsp_index on pg_class
(cost=0.00..0.47rows=1 width=64)    (actual time=0.027..0.060 rows=19 loops=1)  Index Cond: ((relname >= 'Party'::name)
            AND (relname < 'Partz'::name))  Filter: (relname ~~ 'Party%'::text)Total runtime: 0.084 ms
 
-Kevin



Re: Slow tab completion w/ lots of tables

От
Tom Lane
Дата:
Stephen Frost <sfrost@snowman.net> writes:
> * Stephen Frost (sfrost@snowman.net) wrote:
>>> Is this a regression versus earlier releases, or just a bad thing in general?

>> It's really a regression- in prior releases

> Sorry, to clarify (after reading through my -hackers inbox a bit more
> and realizing you were probably asking about 9.2)- it's a regression,
> but it was also in earlier releases..  I'd have to go back to the git
> blame that I ran earlier to find it, but I think the change was made
> in 8.4 or 9.0, so I don't think it's a regression as far as
> 9.0 -> 9.1 or 9.1 -> 9.2 is concerned.

> I'm slightly ashamed to admit that we discovered it during our migration
> from 8.2 -> 9.1...

Um ... I don't see any difference in the clause ordering from 8.2
forward.  "SELECT * FROM baz<TAB>" produces a query like this in 8.2:

LOG:  statement: SELECT pg_catalog.quote_ident(c.relname) FROM pg_catalog.pg_class c WHERE c.relkind IN ('r', 'S', 'v')
ANDsubstring(pg_catalog.quote_ident(c.relname),1,3)='baz' AND pg_catalog.pg_table_is_visible(c.oid) AND c.relnamespace
<>(SELECT oid FROM pg_catalog.pg_namespace WHERE nspname = 'pg_catalog')UNIONSELECT pg_catalog.quote_ident(n.nspname)
||'.' FROM pg_catalog.pg_namespace n WHERE substring(pg_catalog.quote_ident(n.nspname) || '.',1,3)='baz' AND (SELECT
pg_catalog.count(*)FROM pg_catalog.pg_namespace WHERE substring(pg_catalog.quote_ident(nspname) || '.',1,3) =
substring('baz',1,pg_catalog.length(pg_catalog.quote_ident(nspname))+1))> 1UNIONSELECT
pg_catalog.quote_ident(n.nspname)|| '.' || pg_catalog.quote_ident(c.relname) FROM pg_catalog.pg_class c,
pg_catalog.pg_namespacen WHERE c.relnamespace = n.oid AND c.relkind IN ('r', 'S', 'v') AND
substring(pg_catalog.quote_ident(n.nspname)|| '.' || pg_catalog.quote_ident(c.relname),1,3)='baz' AND
substring(pg_catalog.quote_ident(n.nspname)|| '.',1,3) =
substring('baz',1,pg_catalog.length(pg_catalog.quote_ident(n.nspname))+1)AND (SELECT pg_catalog.count(*) FROM
pg_catalog.pg_namespaceWHERE substring(pg_catalog.quote_ident(nspname) || '.',1,3) =
substring('baz',1,pg_catalog.length(pg_catalog.quote_ident(nspname))+1))= 1LIMIT 1000
 

and the only difference in HEAD is another relkind in the IN clauses.
        regards, tom lane



Re: Slow tab completion w/ lots of tables

От
Stephen Frost
Дата:
* Kevin Grittner (Kevin.Grittner@wicourts.gov) wrote:
> That already seems to work for me.

Ah, yes, that does..  I was using the query from tab-complete.c, which
wraps it in quote_ident(c.relanme), which ends up preventing us from
using the index.

There's a couple of other interesting corner cases, such as:

select * from "spa<tab>

Will return nothing, but:

select * from spa<tab>

Works just fine.

Similar, if you have a table called "TEST", then:

select * from "TE

works, but:

select * from TE
(or any other combo w/o a quote)

doesn't work.

Clearly, I'd be happier if we could use the index.  To be honest, I
think it'd also be nice if we could auto-quote tables (I've got quite a
few people who do things like create tables through ODBC, or migrate
from another DB where everything is all-uppercase, or mixed-case, and it
sure would be nice if those "just worked").  I'm not sure that there's
much use-case for asking people to prefix their table with a "- seems
more like a surprise that it's required than expected.

That said, I'm sure there are points to be made about how there'd be
problems with not requiring the ", or with things more complex than just
mixed-case situations.  Perhaps we need to go through and try to
enumerate what we want to happen in each situation and try to reach a
concensus (and a plan for how to implement it..).
Thanks,
    Stephen

Re: Slow tab completion w/ lots of tables

От
Stephen Frost
Дата:
* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> Um ... I don't see any difference in the clause ordering from 8.2
> forward.  "SELECT * FROM baz<TAB>" produces a query like this in 8.2:

Odd..  I could have sworn I saw a difference in the query generated,
but perhaps I just assumed it was reordered, since it ends up ordered
differently in the actual plans:

8.2: http://explain.depesz.com/s/449
9.1: http://explain.depesz.com/s/03O

Increasing the cost of pg_table_is_visible() caused it to move that test
to the end again (under 9.1), which improved the performance (though I'm
sure having a usable index would help a great deal more..).
Thanks,
    Stephen

Re: Slow tab completion w/ lots of tables

От
Tom Lane
Дата:
Stephen Frost <sfrost@snowman.net> writes:
> * Robert Haas (robertmhaas@gmail.com) wrote:
>> Is this a regression versus earlier releases, or just a bad thing in general?

> Alright, so, yea, the commit I was referring to is this one:
> e84487f67a0d216f3db87b2558f1edd322a09e48
> Which was apparently in the 8.3 dev cycle, so this would be an
> 8.2 -> 8.3 regression (as I mentioned in my last email, on this
> particular database, we're going from 8.2 -> 9.1).

Uh, no, that was in the 7.4 cycle.

> My vote is that everyone else needs to have databases with more tables,
> or they need to care about tab-completion speed more. :)

Agreed, but we're not doing anything about that in 9.2.

> I do also want to change tab-complete around a bit to make it so that we
> can actually index the query based on the name, which would clearly be a
> 9.3 thing.  I was expecting some push-back on this idea, but havn't
> heard any yet.  Would people accept adding an index on pg_class.relname
> to support fast tab-completion?  Or is this going to expand into
> figuring out how to support index-based partial lookups for the 'name'
> type, so we could use the existing index (if that's even possible to
> do...)?

The LIKE idea is interesting.  What you'd still need is to suppress the
quote_ident function call so that it becomes just "relname LIKE 'foo%'".
Which seems do-able if possibly rather ugly.  That would leave us with
SELECT ... FROM foo<TAB> being fast but SELECT ... FROM "foo<TAB> being
slow; not sure if that's good enough.  I do *not* want to add an index
on quote_ident(relname) to pg_class though.  Quite aside from speed
issues, I'm pretty sure that functional indexes on core system catalogs
would be a huge can of worms.
        regards, tom lane



Re: Slow tab completion w/ lots of tables

От
Stephen Frost
Дата:
* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> The LIKE idea is interesting.  What you'd still need is to suppress the
> quote_ident function call so that it becomes just "relname LIKE 'foo%'".
> Which seems do-able if possibly rather ugly.  That would leave us with
> SELECT ... FROM foo<TAB> being fast but SELECT ... FROM "foo<TAB> being
> slow; not sure if that's good enough.  I do *not* want to add an index
> on quote_ident(relname) to pg_class though.  Quite aside from speed
> issues, I'm pretty sure that functional indexes on core system catalogs
> would be a huge can of worms.

That's the kind of concern that I was expecting, to be honest. :)  As
Kevin's pointed out, it's not likely to be needed anyway..  There's a
bit of an open question still regarding case-insensitive searching, but
perhaps we let that be slow and only done if we don't get any answers
back from a case-sensetive search?

For example, given these tables:

TEST
test

Doing:

select * from TE<tab>  -> "TEST"
select * from "TE<tab>  -> "TEST"
select * from te<tab>  -> test
select * from "te<tab>  -> test
select * from Te<tab>  -> test (but slow)
select * from "Te<tab>  -> test (but slow)

We'd essentially do: LIKE 'xx%', and then run quote_ident() on the
result (I assume we can replace the whole word, right?).  I'd also
strip off any ", for the purposes of searching with tab-completion.  I'm
not sure how easy it'd be to have a fall-back setup.  I do wonder if we
should do what I often recommend my dev do though, which is to have a
SQL or pl/pgsql function defined on the database-side, rather than
sending large/complex queries to the database from the application..
Thanks,
    Stephen

Re: Slow tab completion w/ lots of tables

От
Tom Lane
Дата:
Stephen Frost <sfrost@snowman.net> writes:
> There's a couple of other interesting corner cases, such as:

Yeah.  I had been thinking of this as purely a performance issue, but
if we want to consider adjusting the visible behavior as well, that
makes it a completely different thing.

> select * from "spa<tab>
> Will return nothing, but:
> select * from spa<tab>
> Works just fine.

Of course this is because quote_ident() only quotes if necessary.

> Similar, if you have a table called "TEST", then:
> select * from "TE
> works, but:
> select * from TE
> (or any other combo w/o a quote)
> doesn't work.

And that's because we're comparing to quote_ident() output, which
will have quotes in this case.

I wonder whether it would work to do something like this: if the word to
be tab-completed starts with a quote, compare to quote_ident output;
otherwise, compare to relname (or other name column) directly.

We'd still emit quote_ident output, which means that if you did
select * from TE<TAB>

it would change that to
select * from "TEST

(assuming you had say TEST1 and TEST2 so it couldn't complete further).
That seems like it would be a step forward in usability.  I'm not sure
that this covers all the corner cases usability-wise though.

From a performance point of view, this would be fast for the unquoted
case (if combined with Stephen's idea to use LIKE).  It'd still be slow
for quoted input though.

But then take another step: if the word-so-far has a leading quote and
no embedded quotes, we can clearly strip the leading quote and compare
the rest directly to the name column.  So that can be fast.  The only
cases that need be slow are names with embedded quotes, which surely
isn't a case that too many people care about.

In short, I think we might be able to make this fast, and more usable,
just with hacking on psql's query generation rules.  There's no need for
server-side changes.
        regards, tom lane



Re: Slow tab completion w/ lots of tables

От
Tom Lane
Дата:
Stephen Frost <sfrost@snowman.net> writes:
> That's the kind of concern that I was expecting, to be honest. :)  As
> Kevin's pointed out, it's not likely to be needed anyway..  There's a
> bit of an open question still regarding case-insensitive searching, but
> perhaps we let that be slow and only done if we don't get any answers
> back from a case-sensetive search?

Um, I don't believe we do any case-insensitive search now, do we?

> We'd essentially do: LIKE 'xx%', and then run quote_ident() on the
> result (I assume we can replace the whole word, right?).  I'd also
> strip off any ", for the purposes of searching with tab-completion.

I think you might be saying the same thing I said in my prior message,
but not quite sure.

> I'm
> not sure how easy it'd be to have a fall-back setup.  I do wonder if we
> should do what I often recommend my dev do though, which is to have a
> SQL or pl/pgsql function defined on the database-side, rather than
> sending large/complex queries to the database from the application..

The nice thing about keeping this knowledge on the psql side is it would
still work with older servers.
        regards, tom lane



Re: Slow tab completion w/ lots of tables

От
"Kevin Grittner"
Дата:
Tom Lane <tgl@sss.pgh.pa.us> wrote:
> We'd still emit quote_ident output, which means that if you did
> 
>     select * from TE<TAB>
> 
> it would change that to
> 
>     select * from "TEST
> 
> (assuming you had say TEST1 and TEST2 so it couldn't complete
> further).
> if the word-so-far has a leading quote and no embedded quotes, we
> can clearly strip the leading quote and compare the rest directly
> to the name column.  So that can be fast.  The only cases that
> need be slow are names with embedded quotes, which surely isn't a
> case that too many people care about.
> 
> In short, I think we might be able to make this fast, and more
> usable, just with hacking on psql's query generation rules. 
> There's no need for server-side changes.
I like it.  A lot.  With our camel-case naming convention, it would
make life a lot easier.
-Kevin



Re: Slow tab completion w/ lots of tables

От
Robert Haas
Дата:
On Tue, Aug 21, 2012 at 1:52 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> In short, I think we might be able to make this fast, and more usable,
> just with hacking on psql's query generation rules.  There's no need for
> server-side changes.

So, I think that hacking on psql's query generation rules may well be
a good idea, but shouldn't we also be bumping procost for the
pg_whatever_is_visible functions?  I mean, Stephen's information
suggests that those values are pretty clearly wrong, regardless of
anything else.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Slow tab completion w/ lots of tables

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> So, I think that hacking on psql's query generation rules may well be
> a good idea, but shouldn't we also be bumping procost for the
> pg_whatever_is_visible functions?  I mean, Stephen's information
> suggests that those values are pretty clearly wrong, regardless of
> anything else.

Yeah, I think we discussed that once before.  I have no particular
objection to doing that in HEAD, just think it's a bit late for 9.2.
In any case, it will only help for medium-size numbers of entries;
once you get to enough tables/functions/whatever that a seqscan of the
catalog is bad news, only fixing the name matching is going to help.
        regards, tom lane



Re: Slow tab completion w/ lots of tables

От
Stephen Frost
Дата:
Tom,

* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> Um, I don't believe we do any case-insensitive search now, do we?

No, I don't suppose we do..  I was thinking we ran quote_ident() on the
search-string side, but apparently we don't, meaning:

select * from TE<tab>

doesn't find 'test'.  I suppose it's alright to keep it that way.

> > We'd essentially do: LIKE 'xx%', and then run quote_ident() on the
> > result (I assume we can replace the whole word, right?).  I'd also
> > strip off any ", for the purposes of searching with tab-completion.
>
> I think you might be saying the same thing I said in my prior message,
> but not quite sure.

You added the distinction that we have to watch out for embedded quotes.
Otherwise, I believe we had the same approach, which is to strip off a
leading quote, if there is one, and then compare the raw string directly
to relname using: LIKE 'xxx%';  If there is an embedded quote, go back
to using quote_ident and matching the whole string.
Thanks,
    Stephen