Обсуждение: Postgresql's table & index compared to that of MySQL
For the same data set, with mostly text data, how does the data (table + index) size of Postgresql compared to that of MySQL? In this presentation, the largest blog site in Japan talked about their migration from Postgresql to MySQL. One of theirreasons for moving away from Postgresql was that data size in Postgresql was too large (p. 12 & p. 41). Specificallythey talked about index being 40% of total data size: http://www.scribd.com/doc/2569473/Migrating-from-PostgreSQL-to-MySQL-at-Cocolog-Japans-Largest-Blog-Community Are there any reasons why table & index sizes of Postgresql should be larger than MySQL? Postgresql uses MVCC while InnoDBdoes not use "full" MVCC, so perhaps that's a factor there. Does anyone have any actual experience about how the data sizes of Postgresql & MySQL compare to each other? The company in the presentation used Postgresql 8.1. Has there been any significant changes in data size between 8.1 and8.4/9.0? Thanks.
Andy <angelflow@yahoo.com> writes: > Are there any reasons why table & index sizes of Postgresql should be larger than MySQL? Well, the per-row headers in PG are definitely bigger because of MVCC requirements. It's hard to speculate much about index sizes with no information about the table/index schemas. > The company in the presentation used Postgresql 8.1. Has there been any significant changes in data size between 8.1 and8.4/9.0? Well, we shaved 4 bytes off the tuple header size since 8.1, and there's been work on cutting per-field overhead too, and there's now some ability to control fillfactor in indexes. But there's really not enough information here to say how much difference this might've made for them. regards, tom lane
On 16 August 2010 23:59, Andy <angelflow@yahoo.com> wrote: > For the same data set, with mostly text data, how does the data (table + index) size of Postgresql compared to that ofMySQL? > > In this presentation, the largest blog site in Japan talked about their migration from Postgresql to MySQL. One of theirreasons for moving away from Postgresql was that data size in Postgresql was too large (p. 12 & p. 41). Specificallythey talked about index being 40% of total data size: > > http://www.scribd.com/doc/2569473/Migrating-from-PostgreSQL-to-MySQL-at-Cocolog-Japans-Largest-Blog-Community > > Are there any reasons why table & index sizes of Postgresql should be larger than MySQL? Postgresql uses MVCC while InnoDBdoes not use "full" MVCC, so perhaps that's a factor there. > > Does anyone have any actual experience about how the data sizes of Postgresql & MySQL compare to each other? > > The company in the presentation used Postgresql 8.1. Has there been any significant changes in data size between 8.1 and8.4/9.0? > > Thanks. > This is quite a crude and unrealistic test (as you'd need a set of real-world data), but just did a quick test using PostgreSQL 9.0 alpha 4 and MySQL . I created a new database in both PostgreSQL and MySQL. Created the same table in both, and loaded identical data (50 paragraphs of lorem ipsum) and got it to insert the table's contents back into itself until both reached 65,536 rows. I also did a VACUUM in PostgreSQL and an OPTIMIZE TABLE in MySQL. PostgreSQL's table size shows 867 MB MySQL's table size as MyISAM shows 2,542 MB MySQL's table size as InnoDB shows: 3,576 MB Also bear in mind that MySQL's InnoDB engine doesn't support full text indexes, and when you can apply full text indexes, it only returns a result if it matches less than 50% of the total rows in the table. PostgreSQL provides GIN and GiST types of index which are used for full text searches, but off the top of my head I don't know if either is actually equivalent to MySQL's implementation. I suspect they're quite different. Hopefully someone more familiar with both system's full text search features can answer that. -- Thom Brown Registered Linux user: #516935
--- On Mon, 8/16/10, Thom Brown <thom@linux.com> wrote: > This is quite a crude and unrealistic test (as you'd need a > set of > real-world data), but just did a quick test using > PostgreSQL 9.0 alpha > 4 and MySQL . I created a new database in both > PostgreSQL and MySQL. > Created the same table in both, and loaded identical data > (50 > paragraphs of lorem ipsum) and got it to insert the table's > contents > back into itself until both reached 65,536 rows. I > also did a VACUUM > in PostgreSQL and an OPTIMIZE TABLE in MySQL. > > PostgreSQL's table size shows 867 MB > MySQL's table size as MyISAM shows 2,542 MB > MySQL's table size as InnoDB shows: 3,576 MB > > Also bear in mind that MySQL's InnoDB engine doesn't > support full text > indexes, and when you can apply full text indexes, it only > returns a > result if it matches less than 50% of the total rows in the > table. > > PostgreSQL provides GIN and GiST types of index which are > used for > full text searches, but off the top of my head I don't know > if either > is actually equivalent to MySQL's implementation. I > suspect they're > quite different. Hopefully someone more familiar with > both system's > full text search features can answer that. > Thanks for doing the test. Your results of 867MB for Postgresql & 3,576 MB for InnoDB are surprising. Do you know why it is so much smaller for Postgresql?Are there any indexes? Are all Postgresql indexes based on GIN & GiST? I'm not using the database for full text search, would I still be using GIN/GiSTindexes, or would I be using the plain old B+ tree?
Andy <angelflow@yahoo.com> writes: > Your results of 867MB for Postgresql & 3,576 MB for InnoDB are surprising. Do you know why it is so much smaller for Postgresql?Are there any indexes? If I understood the original report correctly, they were complaining mostly about index size, so a table without indexes certainly isn't a real helpful comparison. Still, this brings up an important point: AFAICS the paper doesn't even mention which mysql storage engine they're using. So it's *really* hard to tell what we're comparing to. > Are all Postgresql indexes based on GIN & GiST? No, certainly not. See http://www.postgresql.org/docs/8.4/static/indexes-types.html regards, tom lane
On 17/08/10 06:59, Andy wrote: > http://www.scribd.com/doc/2569473/Migrating-from-PostgreSQL-to-MySQL-at-Cocolog-Japans-Largest-Blog-Community > > Are there any reasons why table & index sizes of Postgresql should be larger than MySQL? Postgresql uses MVCC while InnoDBdoes not use "full" MVCC, so perhaps that's a factor there. I can't help but wonder if they were doing regular VACUUM FULL "maintenance" and being bitten by issues with index bloat as a consequence. Though after reading the document linked, it's hard to imagine they would be falling for a mistake that basic. OTOH, a key clue there is: We don't use Auto VACUUM since we are worried about latent response time. ie they wanted to avoid response latency spikes and thought/found they'd be caused by autovacuum. Given that their discussion is of 7.4 to 8.1, that's not unreasonable. It looks to me like their *real* issue was just that they had problems scaling a single PostgreSQL instance up to meet their needs, and were having issues with vacuum under continuous high loads. They decided to move to a sharded system, and moved over to MySQL at that time. That's not necessarily a bad decision, as MySQL is more widely used in sharded setups and there are some good tools to help with that sort of use. It's pretty funny that they had issues with charset handling under Pg, though, as most people seem to complain that Pg is too strict. Any ideas what they might be talking about with: "PostgreSQL allow the out of boundary UTF-8 japanese extended character sets and multi bytes character sets which should normally come back with an error - instead of accepting them" ... and whether it's related to existing or past issues? Or might they have just been using SQL_ASCII? There are also some clues that the way they were doing things may not have been ideal. For example, they mention as a "problem" with MySQL that you get a "different sort order without order by clause". I struggle to interpret that in any way but that they were relying on the undefined ordering of records in a query without an ORDER BY, which is just silly. -- Craig Ringer
On 17 August 2010 04:05, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Andy <angelflow@yahoo.com> writes: >> Your results of 867MB for Postgresql & 3,576 MB for InnoDB are surprising. Do you know why it is so much smaller for Postgresql?Are there any indexes? > > If I understood the original report correctly, they were complaining > mostly about index size, so a table without indexes certainly isn't > a real helpful comparison. Yeah, I did attempt to create a full text GIN index on that last night, but it was taking ages and it was getting late, so abandoned it. If you're interested, I set up one on MySQL's version (MyISAM of course) and it was around 108 MB. The problem is, if PostgreSQL's index was, say, 600 MB, it might still not be fair to compare it since they make not really be equivalent. But those slides leave a lot of important information out. And even if it clearly explained everything in detail, they're talking about 7.4 and 8.0. The world has changed since then. -- Thom Brown Registered Linux user: #516935
On 17 August 2010 13:45, Thom Brown <thom@linux.com> wrote: > On 17 August 2010 04:05, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Andy <angelflow@yahoo.com> writes: >>> Your results of 867MB for Postgresql & 3,576 MB for InnoDB are surprising. Do you know why it is so much smaller forPostgresql? Are there any indexes? >> >> If I understood the original report correctly, they were complaining >> mostly about index size, so a table without indexes certainly isn't >> a real helpful comparison. > > Yeah, I did attempt to create a full text GIN index on that last > night, but it was taking ages and it was getting late, so abandoned > it. If you're interested, I set up one on MySQL's version (MyISAM of > course) and it was around 108 MB. The problem is, if PostgreSQL's > index was, say, 600 MB, it might still not be fair to compare it since > they make not really be equivalent. > > But those slides leave a lot of important information out. And even > if it clearly explained everything in detail, they're talking about > 7.4 and 8.0. The world has changed since then. > Okay, I've left the creation of 2 full text indexes, one using GIN and another using GiST. GIN comes up with 72 MB and GiST 21 MB. But again, this is all rather synthetic and the data I've used contains duplicate content. As for VACUUM performance hits, this has changed since 8.0 too. 8.2 came with more efficient index VACUUMing. 8.3 introduced Heap-Only Tuples which allow dead tuples to be reused. And VACUUM is also tunable in the config. -- Thom Brown Registered Linux user: #516935
On 17 August 2010 16:00, Thom Brown <thom@linux.com> wrote: > On 17 August 2010 13:45, Thom Brown <thom@linux.com> wrote: >> On 17 August 2010 04:05, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> Andy <angelflow@yahoo.com> writes: >>>> Your results of 867MB for Postgresql & 3,576 MB for InnoDB are surprising. Do you know why it is so much smaller forPostgresql? Are there any indexes? >>> >>> If I understood the original report correctly, they were complaining >>> mostly about index size, so a table without indexes certainly isn't >>> a real helpful comparison. >> >> Yeah, I did attempt to create a full text GIN index on that last >> night, but it was taking ages and it was getting late, so abandoned >> it. If you're interested, I set up one on MySQL's version (MyISAM of >> course) and it was around 108 MB. The problem is, if PostgreSQL's >> index was, say, 600 MB, it might still not be fair to compare it since >> they make not really be equivalent. >> >> But those slides leave a lot of important information out. And even >> if it clearly explained everything in detail, they're talking about >> 7.4 and 8.0. The world has changed since then. >> > Okay, I've left the creation of 2 full text indexes, one using GIN and > another using GiST. GIN comes up with 72 MB and GiST 21 MB. > > But again, this is all rather synthetic and the data I've used > contains duplicate content. As for VACUUM performance hits, this has > changed since 8.0 too. 8.2 came with more efficient index VACUUMing. > 8.3 introduced Heap-Only Tuples which allow dead tuples to be reused. > And VACUUM is also tunable in the config. > Actually, if MySQL won't return anything which occurs in 50% or more of the rows, and all the rows in my test were duplicates, what's it spending 108 MB on if there's no full text query I can use which can return results? -- Thom Brown Registered Linux user: #516935