Re: Query plan and Inheritance. Weird behavior
От | Andras Kadinger |
---|---|
Тема | Re: Query plan and Inheritance. Weird behavior |
Дата | |
Msg-id | 3E3879B9.F9060B14@surfnonstop.com обсуждение исходный текст |
Ответ на | Re: Query plan and Inheritance. Weird behavior (John Lange <lists@darkcore.net>) |
Список | pgsql-performance |
John Lange wrote: > > On Wed, 2003-01-29 at 01:52, Andras Kadinger wrote: > > As Tom pointed out, only the schema is copied, but not the data. > > I guess you are right, strictly speaking this isn't a violation of > normalization since no data is duplicated. [...] > Ok, your reply here is very informative. Firstly, I can see from your > example that I likely don't have my keys and constraints implemented > properly. > > However, the issue of indexes is not necessarily that relevant since you > may not be selecting rows based on columns that have indexes. Granted, now I see that was not strictly related to your point, I just wanted to avoid most avoidable objections against performance of inheritance, and I wasn't 100% sure you seeing seq scans was not part of you thinking performance of this method would be suboptimal so just to be sure, I explicitly went for an example with indexes. > So the issue of indexes aside, I think some of the misunderstanding is > related to my assumption that the appended operations are relatively > more expensive than scanning the same number of rows in a single select. I see. Well, the Append step itself I suppose is not doing much else than iterates over its subnodes and asks each of them to return their rows, and forwards the rows upwards to the rest of the plan as it receives them - it doesn't buffer them, or collect them all before forwarding them upwards (I think the Materialize step that were to be seen in an example in my last PS is the one that does that). So I don't think the Append incurs any significant costs much more than a few CPU cycles for that iteration and row forwarding (pass of one pointer to in-memory row I guess) steps - I think these are minuscule compared to the cost of any disk I/O, and in most non-CPU-bound queries are hidden by disk throughput/latency anyway. > Here is the way it looks on my system when I select a single object. > > db_drs0001=> explain select * from tbl_objects where id = 1; > NOTICE: QUERY PLAN: > > Result (cost=0.00..153.70 rows=6 width=111) > -> Append (cost=0.00..153.70 rows=6 width=111) > -> Seq Scan on tbl_objects (cost=0.00..144.35 rows=1 width=107) > -> Seq Scan on tbl_viewers tbl_objects (cost=0.00..1.06 rows=1 > width=97) > -> Seq Scan on tbl_documents tbl_objects (cost=0.00..4.91 rows=1 > width=111) > -> Seq Scan on tbl_formats tbl_objects (cost=0.00..1.11 rows=1 > width=100) > -> Seq Scan on tbl_massemails tbl_objects (cost=0.00..1.01 rows=1 > width=90) > -> Seq Scan on tbl_icons tbl_objects (cost=0.00..1.25 rows=1 > width=110) > > db_drs0001=> select version(); > version > --------------------------------------------------------------- > PostgreSQL 7.2.1 on i686-pc-linux-gnu, compiled by GCC 2.95.3 > (1 row) > > The only question here is, if a select requires the scanning of all rows > to return a result set, is it dramatically less efficient to have it > scanning 100,000 rows spread over 6 tables or to scan 100,000 rows in a > single table? I see. Well, the Append step itself I suppose isn't doing much more than iterates over its subnodes (the seq scans in the case above) and asks each of them to return rows, and forwards the rows upwards to the rest of the plan as it receives them - it doesn't buffer them, or collect them all before forwarding them upwards (I think the Materialize step that were to be seen in an example in my last PS does that). So aside for any costs of consulting indexes, I don't think the Append step - which is the added step when scanning multiple tables versus scanning one table - incurs any significant costs much more than a few CPU cycles for those iteration and row forwarding (pass of one pointer to in-memory row I guess) steps - I think these are minuscule compared to the cost of any disk I/O, and in most non-CPU-bound queries are hidden by disk throughput/latency anyway. > (At the moment I have no where near that amount of data. Side question, > what technique do you use to generate data to fill your tables for > testing?) For this occasion I just went and created a dozen-line PHP script that simply inserted 10000 rows with consecutive ids into each table. I suggest you to try to populate your test database with test data on the order of your expected working data set and use EXPLAIN ANALYZE to make estimates of expected performance of the database. > I'm now starting to see that it likely isn't that much different either > way so the benefits of the way it's implemented probably out weigh the > negatives. Your end up scanning the same number of rows either way. Aside from extreme cases where child rows are considerably much more wider than parent rows and thus result in considerably more data needed to be read in case of a sequential scan, yes. > On the topic of proper indexes, if you would indulge me, can you show me > where I have gone wrong in that regard? Hmm, I think you should only have gone and created indexes for child tables by hand, as indexes are not inherited. Also, don't forget, PostgreSQL has an advanced query cost estimation subsystem, which decides for or against using an index based on, among others, statistics collected on distribution of values in the index to determine its selectivity (so don't forget to ANALYZE/VACUUM ANALYZE after inserting/changing a lot of rows that significantly change distribution of values - this includes initial table fillup), and also it accounts for costs of accessing index pages, so with less than say a couple of thousand rows or with not very selective indexes it will (rightly) decide not to use the index but do a seq scan instead - probably the reason for why you don't see an index scan on tbl_objects above despite there being an index on the primary key. > My biggest point of confusion > here is with regards to the sequences that are used in the parent table. Child tables inherit the "nextval('...')" default value, so as a result they will all draw from the same one sequence, which sequence exists outside of the tables; as a result as long as you use that default value, it is guaranteed that the column in question will have unique values among all tables parent and children; they won't be consecutive - but that's not a drawback of inheritance either, as a sequence is not guaranteed to provide consecutive numbers with single tables either due to transaction concurrency (rolled back transactions don't 'put back' numbers into the sequence, so in the case of rolled back transactions there will be numbers drawn from the sequence that never actually get into any table - this is nicely documented with sequences and transactions). > Here is the schema as produced by pg_dump. The original create used the > keyword "serial" or "bigserial" as the case may be. I've edited some of > the columns out just to keep the example shorter: > > CREATE SEQUENCE "tbl_objects_id_seq" start 1 increment 1 maxvalue > 9223372036854775807 minvalue 1 cache 1; > > CREATE TABLE "tbl_objects" ( > "id" bigint DEFAULT nextval('"tbl_objects_id_seq"'::text) NOT NULL, > "name" text DEFAULT '' NOT NULL, > "description" text DEFAULT '' NOT NULL, > "status" smallint DEFAULT '1' NOT NULL, > "class" text > ); > > CREATE TABLE "tbl_viewers" ( > "exec" text DEFAULT '' NOT NULL ) > INHERITS ("tbl_objects"); > > CREATE TABLE "tbl_documents" ( > "filename" text DEFAULT '' NOT NULL ) > INHERITS ("tbl_objects"); > > CREATE TABLE "tbl_massemails" ( > "from" text DEFAULT '' NOT NULL, > "subject" text DEFAULT '' NOT NULL, > "message" text DEFAULT '' NOT NULL ) > INHERITS ("tbl_objects"); > > CREATE TABLE "tbl_icons" ( > "format_id" bigint DEFAULT '0' NOT NULL ) > INHERITS ("tbl_documents"); > > CREATE TABLE "tbl_formats" ( > "viewer_id" bigint DEFAULT '0' NOT NULL, > "extension" text DEFAULT '' NOT NULL, > "contenttype" text DEFAULT '' NOT NULL, > "upload_class" text ) > INHERITS ("tbl_objects"); > > CREATE UNIQUE INDEX tbl_objects_id_key ON tbl_objects USING btree (id); Hmm, I wonder whether you have a specific goal with or reason for explicitly specifying NOT NULL and empty string ('') as default value for all these text fields? If it's just because your frontend makes it inconvenient for you to treat a NULL as empty string, you might want to consider allowing NULLs and using the coalesce() function in your select - this would incur a few CPU cycles per returned result row, but will spare you a few bytes in storage - I think 4 or 8 per column - for each NULL value. Whether this is worth it or not depends on the percentage of empty/NULL values in your data though. > Thanks very much for taking the time to look into this with me. It has > been most informative. You're welcome! Regards, Andras
В списке pgsql-performance по дате отправления: