Обсуждение: Slow functional indexes?

Поиск
Список
Период
Сортировка

Slow functional indexes?

От
Stuart Bishop
Дата:
I would like to understand what causes some of my indexes to be slower to
use than others with PostgreSQL 8.1. On a particular table, I have an int4
primary key, an indexed unique text 'name' column and a functional index of
type text. The function (person_sort_key()) is declared IMMUTABLE and
RETURNS NULL ON NULL INPUT.

A simple query ordering by each of these columns generates nearly identical
query plans, however runtime differences are significantly slower using the
functional index. If I add a new column to the table containing the result
of the function, index it and query ordering by this new column then the
runtime is nearly an order of magnitude faster than using the functional
index (and again, query plans are nearly identical).

(The following log is also at
http://rafb.net/paste/results/vKVuyi47.nln.html if that is more readable)

demo=# vacuum full analyze person;
VACUUM
demo=# reindex table person;
REINDEX
demo=# cluster person_pkey on person;
CLUSTER
demo=# explain analyze select * from person order by id offset 527000 limit 50;
                                                                 QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=37039.09..37042.61 rows=50 width=531) (actual
time=1870.393..1870.643 rows=50 loops=1)
   ->  Index Scan using person_pkey on person  (cost=0.00..37093.42
rows=527773 width=531) (actual time=0.077..1133.659 rows=527050 loops=1)
 Total runtime: 1870.792 ms
(3 rows)

demo=# cluster person_name_key on person;
CLUSTER
demo=# explain analyze select * from person order by name offset 527000
limit 50;
                                                                   QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=63727.87..63733.91 rows=50 width=531) (actual
time=1865.769..1866.028 rows=50 loops=1)
   ->  Index Scan using person_name_key on person  (cost=0.00..63821.34
rows=527773 width=531) (actual time=0.068..1138.649 rows=527050 loops=1)
 Total runtime: 1866.153 ms
(3 rows)

demo=# cluster person_sorting_idx on person;
CLUSTER
demo=# explain analyze select * from person order by
person_sort_key(displayname,name) offset 527000 limit 50;
                                                                     QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=65806.62..65812.86 rows=50 width=531) (actual
time=13846.677..13848.102 rows=50 loops=1)
   ->  Index Scan using person_sorting_idx on person  (cost=0.00..65903.14
rows=527773 width=531) (actual time=0.214..13093.090 rows=527050 loops=1)
 Total runtime: 13848.254 ms
(3 rows)

demo=# alter table person add column sort_key text;
ALTER TABLE
demo=# update person set sort_key=person_sort_key(displayname,name);
UPDATE 527773
demo=# create index person_sort_key_idx on person(sort_key);
CREATE INDEX
demo=# vacuum analyze person;
VACUUM
demo=# cluster person_sort_key_idx on person;
CLUSTER
demo=# explain analyze select * from person order by sort_key offset 527000
limit 50;
                                                                     QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=41069.28..41073.18 rows=50 width=553) (actual
time=1999.456..1999.724 rows=50 loops=1)
   ->  Index Scan using person_sort_key_idx on person  (cost=0.00..41129.52
rows=527773 width=553) (actual time=0.079..1274.952 rows=527050 loops=1)
 Total runtime: 1999.858 ms
(3 rows)


--
Stuart Bishop <stuart.bishop@canonical.com>   http://www.canonical.com/
Canonical Ltd.                                http://www.ubuntu.com/


Вложения

Re: Slow functional indexes?

От
"Merlin Moncure"
Дата:
On 10/20/06, Stuart Bishop <stuart.bishop@canonical.com> wrote:
> I would like to understand what causes some of my indexes to be slower to
> use than others with PostgreSQL 8.1. On a particular table, I have an int4
> primary key, an indexed unique text 'name' column and a functional index of
> type text. The function (person_sort_key()) is declared IMMUTABLE and
> RETURNS NULL ON NULL INPUT.

database will not allow you to create index if the function is not immutable.

> A simple query ordering by each of these columns generates nearly identical
> query plans, however runtime differences are significantly slower using the
> functional index. If I add a new column to the table containing the result
> of the function, index it and query ordering by this new column then the
> runtime is nearly an order of magnitude faster than using the functional
> index (and again, query plans are nearly identical).

> demo=# explain analyze select * from person order by id offset 527000 limit 50;
>                                                                  QUERY PLAN

it looks you just turned up a bad interaction between a functional
index and 'offset' probably your function is getting executed extra
times or there is a sort going on.  however, I'd suggest not using
'offset', because its bad design.

merlin

Re: Slow functional indexes?

От
Tom Lane
Дата:
Stuart Bishop <stuart.bishop@canonical.com> writes:
> I would like to understand what causes some of my indexes to be slower to
> use than others with PostgreSQL 8.1.

I was about to opine that it was all about different levels of
correlation between the index order and physical table order ... but
your experiments with freshly clustered indexes seem to cast doubt
on that idea.  Are you sure your function is really immutable?  A buggy
function could possibly lead to a "clustered" index not being in
physical order.

            regards, tom lane

Re: Slow functional indexes?

От
Stuart Bishop
Дата:
Tom Lane wrote:
> Stuart Bishop <stuart.bishop@canonical.com> writes:
>> I would like to understand what causes some of my indexes to be slower to
>> use than others with PostgreSQL 8.1.
>
> I was about to opine that it was all about different levels of
> correlation between the index order and physical table order ... but
> your experiments with freshly clustered indexes seem to cast doubt
> on that idea.  Are you sure your function is really immutable?  A buggy
> function could possibly lead to a "clustered" index not being in
> physical order.

Definitely immutable. Here is the function definition:


CREATE OR REPLACE FUNCTION person_sort_key(displayname text, name text)
RETURNS text
LANGUAGE plpythonu IMMUTABLE RETURNS NULL ON NULL INPUT AS
$$
    # NB: If this implementation is changed, the person_sort_idx needs to be
    # rebuilt along with any other indexes using it.
    import re

    try:
        strip_re = SD["strip_re"]
    except KeyError:
        strip_re = re.compile("(?:[^\w\s]|[\d_])", re.U)
        SD["strip_re"] = strip_re

    displayname, name = args

    # Strip noise out of displayname. We do not have to bother with
    # name, as we know it is just plain ascii.
    displayname = strip_re.sub('', displayname.decode('UTF-8').lower())
    return ("%s, %s" % (displayname.strip(), name)).encode('UTF-8')
$$;


--
Stuart Bishop <stuart@stuartbishop.net>
http://www.stuartbishop.net/


Вложения

Re: Slow functional indexes?

От
Stuart Bishop
Дата:
Stuart Bishop wrote:
> I would like to understand what causes some of my indexes to be slower to
> use than others with PostgreSQL 8.1. On a particular table, I have an int4
> primary key, an indexed unique text 'name' column and a functional index of
> type text. The function (person_sort_key()) is declared IMMUTABLE and
> RETURNS NULL ON NULL INPUT.
>
> A simple query ordering by each of these columns generates nearly identical
> query plans, however runtime differences are significantly slower using the
> functional index. If I add a new column to the table containing the result
> of the function, index it and query ordering by this new column then the
> runtime is nearly an order of magnitude faster than using the functional
> index (and again, query plans are nearly identical).
>
> (The following log is also at
> http://rafb.net/paste/results/vKVuyi47.nln.html if that is more readable)

Here is a minimal test case that demonstrates the issue. Can anyone else
reproduce these results? Of the four EXPLAIN ANALYZE SELECT statements at
the end, the one that orders by a user created IMMUTABLE stored procedure is
consistently slower than the other three variants.


BEGIN;
DROP TABLE TestCase;
COMMIT;
ABORT;

BEGIN;
CREATE TABLE TestCase (name text, alt_name text);

CREATE OR REPLACE FUNCTION munge(s text) RETURNS text
IMMUTABLE RETURNS NULL ON NULL INPUT
LANGUAGE plpgsql AS $$
BEGIN
    RETURN lower(s);
END;
$$;

-- Fill the table with random strings
CREATE OR REPLACE FUNCTION fill_testcase(num_rows int) RETURNS boolean
LANGUAGE plpgsql AS
$$
DECLARE
    row_num int;
    char_num int;
    name text;
BEGIN
    FOR row_num IN 1..num_rows LOOP
        name := '';
        FOR  char_num IN 1..round(random() * 100) LOOP
            name := name || chr((
                round(random() * (ascii('z') - ascii('!'))) + ascii('!')
                )::int);
        END LOOP;
        INSERT INTO TestCase VALUES (name, lower(name));
        IF row_num % 20000 = 0 THEN
            RAISE NOTICE '% of % rows inserted', row_num, num_rows;
        END IF;
    END LOOP;
    RETURN TRUE;
END;
$$;

SELECT fill_testcase(500000);

CREATE INDEX testcase__name__idx ON TestCase(name);
CREATE INDEX testcase__lower__idx ON TestCase(lower(name));
CREATE INDEX testcase__munge__idx ON TestCase(munge(name));
CREATE INDEX testcase__alt_name__idx ON TestCase(alt_name);

COMMIT;

ANALYZE TestCase;

EXPLAIN ANALYZE SELECT * FROM TestCase ORDER BY name;
EXPLAIN ANALYZE SELECT * FROM TestCase ORDER BY lower(name);
EXPLAIN ANALYZE SELECT * FROM TestCase ORDER BY munge(name);
EXPLAIN ANALYZE SELECT * FROM TestCase ORDER BY alt_name;

EXPLAIN ANALYZE SELECT * FROM TestCase ORDER BY name;
EXPLAIN ANALYZE SELECT * FROM TestCase ORDER BY lower(name);
EXPLAIN ANALYZE SELECT * FROM TestCase ORDER BY munge(name);
EXPLAIN ANALYZE SELECT * FROM TestCase ORDER BY alt_name;


--
Stuart Bishop <stuart@stuartbishop.net>
http://www.stuartbishop.net/


Вложения

Re: Slow functional indexes?

От
Tom Lane
Дата:
Stuart Bishop <stuart@stuartbishop.net> writes:
> Here is a minimal test case that demonstrates the issue. Can anyone else
> reproduce these results? Of the four EXPLAIN ANALYZE SELECT statements at
> the end, the one that orders by a user created IMMUTABLE stored procedure is
> consistently slower than the other three variants.

Wow, interesting.  I'm surprised we never realized this before, but
here's the deal: the generated plan computes the ORDER BY expressions
even if we end up not needing them because the ordering is created by
an indexscan rather than an explicit sort step.  (Such a sort step would
of course need the values as input.)  So the differential you're seeing
represents the time for all those useless evaluations of the function.
The difference in the estimated cost comes from that too --- the code
doing the estimation can see perfectly well that there's an extra
function call in the plan ...

Not sure whether there's a simple way to fix this; it might take some
nontrivial rejiggering in the planner.  Or maybe not, but I don't have
any cute ideas about it at the moment.

I wonder whether there are any other cases where we are doing useless
computations of resjunk columns?

            regards, tom lane

Re: Slow functional indexes?

От
Gene
Дата:
I have a varchar field which is most commonly queried like "someField like '%abcd'". Realizing that it wouldn't be able to use an index for this type of query I created a reverse() function and an index using the function reverse(someField) so that the query would be performed as "reverse(someField) like reverse('%abcd')". When I looked at the query plan it seemed like it was using the new reverse index properly but also seemed to run slower. Would this explain these bazaar results? I have since gone back to the method without using the reverse function. Thanks

On 11/5/06, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Stuart Bishop <stuart@stuartbishop.net> writes:
> Here is a minimal test case that demonstrates the issue. Can anyone else
> reproduce these results? Of the four EXPLAIN ANALYZE SELECT statements at
> the end, the one that orders by a user created IMMUTABLE stored procedure is
> consistently slower than the other three variants.

Wow, interesting.  I'm surprised we never realized this before, but
here's the deal: the generated plan computes the ORDER BY expressions
even if we end up not needing them because the ordering is created by
an indexscan rather than an explicit sort step.  (Such a sort step would
of course need the values as input.)  So the differential you're seeing
represents the time for all those useless evaluations of the function.
The difference in the estimated cost comes from that too --- the code
doing the estimation can see perfectly well that there's an extra
function call in the plan ...

Not sure whether there's a simple way to fix this; it might take some
nontrivial rejiggering in the planner.  Or maybe not, but I don't have
any cute ideas about it at the moment.

I wonder whether there are any other cases where we are doing useless
computations of resjunk columns?

                        regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
       subscribe-nomail command to majordomo@postgresql.org so that your
       message can get through to the mailing list cleanly



--
Gene Hart
cell: 443-604-2679