Re: Query plan and Inheritance. Weird behavior

Поиск
Список
Период
Сортировка
От Andras Kadinger
Тема Re: Query plan and Inheritance. Weird behavior
Дата
Msg-id 3E3879B9.F9060B14@surfnonstop.com
обсуждение исходный текст
Ответ на Re: Query plan and Inheritance. Weird behavior  (John Lange <lists@darkcore.net>)
Список pgsql-performance
John Lange wrote:
>
> On Wed, 2003-01-29 at 01:52, Andras Kadinger wrote:
> > As Tom pointed out, only the schema is copied, but not the data.
>
> I guess you are right, strictly speaking this isn't a violation of
> normalization since no data is duplicated.

[...]

> Ok, your reply here is very informative. Firstly, I can see from your
> example that I likely don't have my keys and constraints implemented
> properly.
>
> However, the issue of indexes is not necessarily that relevant since you
> may not be selecting rows based on columns that have indexes.

Granted, now I see that was not strictly related to your point, I just
wanted to avoid most avoidable objections against performance of
inheritance, and I wasn't 100% sure you seeing seq scans was not part of
you thinking performance of this method would be suboptimal so just to
be sure, I explicitly went for an example with indexes.

> So the issue of indexes aside, I think some of the misunderstanding is
> related to my assumption that the appended operations are relatively
> more expensive than scanning the same number of rows in a single select.

I see. Well, the Append step itself I suppose is not doing much else
than iterates over its subnodes and asks each of them to return their
rows, and forwards the rows upwards to the rest of the plan as it
receives them - it doesn't buffer them, or collect them all before
forwarding them upwards (I think the Materialize step that were to be
seen in an example in my last PS is the one that does that).

So I don't think the Append incurs any significant costs much more than
a few CPU cycles for that iteration and row forwarding (pass of one
pointer to in-memory row I guess) steps - I think these are minuscule
compared to the cost of any disk I/O, and in most non-CPU-bound queries
are hidden by disk throughput/latency anyway.

> Here is the way it looks on my system when I select a single object.
>
> db_drs0001=> explain select * from tbl_objects where id = 1;
> NOTICE:  QUERY PLAN:
>
> Result  (cost=0.00..153.70 rows=6 width=111)
>  ->  Append  (cost=0.00..153.70 rows=6 width=111)
>   ->  Seq Scan on tbl_objects  (cost=0.00..144.35 rows=1 width=107)
>   ->  Seq Scan on tbl_viewers tbl_objects  (cost=0.00..1.06 rows=1
> width=97)
>   ->  Seq Scan on tbl_documents tbl_objects  (cost=0.00..4.91 rows=1
> width=111)
>   ->  Seq Scan on tbl_formats tbl_objects  (cost=0.00..1.11 rows=1
> width=100)
>   ->  Seq Scan on tbl_massemails tbl_objects  (cost=0.00..1.01 rows=1
> width=90)
>   ->  Seq Scan on tbl_icons tbl_objects  (cost=0.00..1.25 rows=1
> width=110)
>
> db_drs0001=> select version();
>                             version
> ---------------------------------------------------------------
>  PostgreSQL 7.2.1 on i686-pc-linux-gnu, compiled by GCC 2.95.3
> (1 row)
>
> The only question here is, if a select requires the scanning of all rows
> to return a result set, is it dramatically less efficient to have it
> scanning 100,000 rows spread over 6 tables or to scan 100,000 rows in a
> single table?

I see. Well, the Append step itself I suppose isn't doing much more than
iterates over its subnodes (the seq scans in the case above) and asks
each of them to return rows, and forwards the rows upwards to the rest
of the plan as it receives them - it doesn't buffer them, or collect
them all before forwarding them upwards (I think the Materialize step
that were to be seen in an example in my last PS does that). So aside
for any costs of consulting indexes, I don't think the Append step -
which is the added step when scanning multiple tables versus scanning
one table - incurs any significant costs much more than a few CPU cycles
for those iteration and row forwarding (pass of one pointer to in-memory
row I guess) steps - I think these are minuscule compared to the cost of
any disk I/O, and in most non-CPU-bound queries are hidden by disk
throughput/latency anyway.

> (At the moment I have no where near that amount of data. Side question,
> what technique do you use to generate data to fill your tables for
> testing?)

For this occasion I just went and created a dozen-line PHP script that
simply inserted 10000 rows with consecutive ids into each table.

I suggest you to try to populate your test database with test data on
the order of your expected working data set and use EXPLAIN ANALYZE to
make estimates of expected performance of the database.

> I'm now starting to see that it likely isn't that much different either
> way so the benefits of the way it's implemented probably out weigh the
> negatives. Your end up scanning the same number of rows either way.

Aside from extreme cases where child rows are considerably much more
wider than parent rows and thus result in considerably more data needed
to be read in case of a sequential scan, yes.

> On the topic of proper indexes, if you would indulge me, can you show me
> where I have gone wrong in that regard?

Hmm, I think you should only have gone and created indexes for child
tables by hand, as indexes are not inherited.

Also, don't forget, PostgreSQL has an advanced query cost estimation
subsystem, which decides for or against using an index based on, among
others, statistics collected on distribution of values in the index to
determine its selectivity (so don't forget to ANALYZE/VACUUM ANALYZE
after inserting/changing a lot of rows that significantly change
distribution of values - this includes initial table fillup), and also
it accounts for costs of accessing index pages, so with less than say a
couple of thousand rows or with not very selective indexes it will
(rightly) decide not to use the index but do a seq scan instead -
probably the reason for why you don't see an index scan on tbl_objects
above despite there being an index on the primary key.

> My biggest point of confusion
> here is with regards to the sequences that are used in the parent table.

Child tables inherit the "nextval('...')" default value, so as a result
they will all draw from the same one sequence, which sequence exists
outside of the tables; as a result as long as you use that default
value, it is guaranteed that the column in question will have unique
values among all tables parent and children; they won't be consecutive -
but that's not a drawback of inheritance either, as a sequence is not
guaranteed to provide consecutive numbers with single tables either due
to transaction concurrency (rolled back transactions don't 'put back'
numbers into the sequence, so in the case of rolled back transactions
there will be numbers drawn from the sequence that never actually get
into any table - this is nicely documented with sequences and
transactions).

> Here is the schema as produced by pg_dump. The original create used the
> keyword "serial" or "bigserial" as the case may be. I've edited some of
> the columns out just to keep the example shorter:
>
> CREATE SEQUENCE "tbl_objects_id_seq" start 1 increment 1 maxvalue
> 9223372036854775807 minvalue 1 cache 1;
>
> CREATE TABLE "tbl_objects" (
>    "id" bigint DEFAULT nextval('"tbl_objects_id_seq"'::text) NOT NULL,
>    "name" text DEFAULT '' NOT NULL,
>    "description" text DEFAULT '' NOT NULL,
>    "status" smallint DEFAULT '1' NOT NULL,
>    "class" text
> );
>
> CREATE TABLE "tbl_viewers" (
>         "exec" text DEFAULT '' NOT NULL )
> INHERITS ("tbl_objects");
>
> CREATE TABLE "tbl_documents" (
>         "filename" text DEFAULT '' NOT NULL )
> INHERITS ("tbl_objects");
>
> CREATE TABLE "tbl_massemails" (
>         "from" text DEFAULT '' NOT NULL,
>         "subject" text DEFAULT '' NOT NULL,
>         "message" text DEFAULT '' NOT NULL )
> INHERITS ("tbl_objects");
>
> CREATE TABLE "tbl_icons" (
>         "format_id" bigint DEFAULT '0' NOT NULL )
> INHERITS ("tbl_documents");
>
> CREATE TABLE "tbl_formats" (
>         "viewer_id" bigint DEFAULT '0' NOT NULL,
>         "extension" text DEFAULT '' NOT NULL,
>         "contenttype" text DEFAULT '' NOT NULL,
>         "upload_class" text )
> INHERITS ("tbl_objects");
>
> CREATE UNIQUE INDEX tbl_objects_id_key ON tbl_objects USING btree (id);

Hmm, I wonder whether you have a specific goal with or reason for
explicitly specifying NOT NULL and empty string ('') as default value
for all these text fields? If it's just because your frontend makes it
inconvenient for you to treat a NULL as empty string, you might want to
consider allowing NULLs and using the coalesce() function in your select
- this would incur a few CPU cycles per returned result row, but will
spare you a few bytes in storage - I think 4 or 8 per column - for each
NULL value. Whether this is worth it or not depends on the percentage of
empty/NULL values in your data though.

> Thanks very much for taking the time to look into this with me. It has
> been most informative.

You're welcome!

Regards,
Andras

В списке pgsql-performance по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: 1 char in the world
Следующее
От: Anil Kumar
Дата:
Сообщение: Strangae Query Plans