Обсуждение: Postgresql's table & index compared to that of MySQL

Поиск
Список
Период
Сортировка

Postgresql's table & index compared to that of MySQL

От
Andy
Дата:
For the same data set, with mostly text data, how does the data (table + index) size of Postgresql compared to that of
MySQL?

In this presentation, the largest blog site in Japan talked about their migration from Postgresql to MySQL. One of
theirreasons for moving away from Postgresql was that data size in Postgresql was too large (p. 12 & p. 41).
Specificallythey talked about index being 40% of total data size:  

http://www.scribd.com/doc/2569473/Migrating-from-PostgreSQL-to-MySQL-at-Cocolog-Japans-Largest-Blog-Community

Are there any reasons why table & index sizes of Postgresql should be larger than MySQL? Postgresql uses MVCC while
InnoDBdoes not use "full" MVCC, so perhaps that's a factor there. 

Does anyone have any actual experience about how the data sizes of Postgresql & MySQL compare to each other?

The company in the presentation used Postgresql 8.1. Has there been any significant changes in data size between 8.1
and8.4/9.0? 

Thanks.




Re: Postgresql's table & index compared to that of MySQL

От
Tom Lane
Дата:
Andy <angelflow@yahoo.com> writes:
> Are there any reasons why table & index sizes of Postgresql should be larger than MySQL?

Well, the per-row headers in PG are definitely bigger because of MVCC
requirements.  It's hard to speculate much about index sizes with
no information about the table/index schemas.

> The company in the presentation used Postgresql 8.1. Has there been any significant changes in data size between 8.1
and8.4/9.0? 

Well, we shaved 4 bytes off the tuple header size since 8.1, and there's
been work on cutting per-field overhead too, and there's now some
ability to control fillfactor in indexes.  But there's really not enough
information here to say how much difference this might've made for them.

            regards, tom lane

Re: Postgresql's table & index compared to that of MySQL

От
Thom Brown
Дата:
On 16 August 2010 23:59, Andy <angelflow@yahoo.com> wrote:
> For the same data set, with mostly text data, how does the data (table + index) size of Postgresql compared to that
ofMySQL? 
>
> In this presentation, the largest blog site in Japan talked about their migration from Postgresql to MySQL. One of
theirreasons for moving away from Postgresql was that data size in Postgresql was too large (p. 12 & p. 41).
Specificallythey talked about index being 40% of total data size: 
>
> http://www.scribd.com/doc/2569473/Migrating-from-PostgreSQL-to-MySQL-at-Cocolog-Japans-Largest-Blog-Community
>
> Are there any reasons why table & index sizes of Postgresql should be larger than MySQL? Postgresql uses MVCC while
InnoDBdoes not use "full" MVCC, so perhaps that's a factor there. 
>
> Does anyone have any actual experience about how the data sizes of Postgresql & MySQL compare to each other?
>
> The company in the presentation used Postgresql 8.1. Has there been any significant changes in data size between 8.1
and8.4/9.0? 
>
> Thanks.
>

This is quite a crude and unrealistic test (as you'd need a set of
real-world data), but just did a quick test using PostgreSQL 9.0 alpha
4 and MySQL .  I created a new database in both PostgreSQL and MySQL.
Created the same table in both, and loaded identical data (50
paragraphs of lorem ipsum) and got it to insert the table's contents
back into itself until both reached 65,536 rows.  I also did a VACUUM
in PostgreSQL and an OPTIMIZE TABLE in MySQL.

PostgreSQL's table size shows 867 MB
MySQL's table size as MyISAM shows 2,542 MB
MySQL's table size as InnoDB shows: 3,576 MB

Also bear in mind that MySQL's InnoDB engine doesn't support full text
indexes, and when you can apply full text indexes, it only returns a
result if it matches less than 50% of the total rows in the table.

PostgreSQL provides GIN and GiST types of index which are used for
full text searches, but off the top of my head I don't know if either
is actually equivalent to MySQL's implementation.  I suspect they're
quite different.  Hopefully someone more familiar with both system's
full text search features can answer that.

--
Thom Brown
Registered Linux user: #516935

Re: Postgresql's table & index compared to that of MySQL

От
Andy
Дата:

--- On Mon, 8/16/10, Thom Brown <thom@linux.com> wrote:

> This is quite a crude and unrealistic test (as you'd need a
> set of
> real-world data), but just did a quick test using
> PostgreSQL 9.0 alpha
> 4 and MySQL .  I created a new database in both
> PostgreSQL and MySQL.
> Created the same table in both, and loaded identical data
> (50
> paragraphs of lorem ipsum) and got it to insert the table's
> contents
> back into itself until both reached 65,536 rows.  I
> also did a VACUUM
> in PostgreSQL and an OPTIMIZE TABLE in MySQL.
>
> PostgreSQL's table size shows 867 MB
> MySQL's table size as MyISAM shows 2,542 MB
> MySQL's table size as InnoDB shows: 3,576 MB
>
> Also bear in mind that MySQL's InnoDB engine doesn't
> support full text
> indexes, and when you can apply full text indexes, it only
> returns a
> result if it matches less than 50% of the total rows in the
> table.
>
> PostgreSQL provides GIN and GiST types of index which are
> used for
> full text searches, but off the top of my head I don't know
> if either
> is actually equivalent to MySQL's implementation.  I
> suspect they're
> quite different.  Hopefully someone more familiar with
> both system's
> full text search features can answer that.
>

Thanks for doing the test.

Your results of 867MB for Postgresql & 3,576 MB for InnoDB are surprising. Do you know why it is so much smaller for
Postgresql?Are there any indexes? 

Are all Postgresql indexes based on GIN & GiST? I'm not using the database for full text search, would I still be using
GIN/GiSTindexes, or would I be using the plain old B+ tree? 




Re: Postgresql's table & index compared to that of MySQL

От
Tom Lane
Дата:
Andy <angelflow@yahoo.com> writes:
> Your results of 867MB for Postgresql & 3,576 MB for InnoDB are surprising. Do you know why it is so much smaller for
Postgresql?Are there any indexes? 

If I understood the original report correctly, they were complaining
mostly about index size, so a table without indexes certainly isn't
a real helpful comparison.  Still, this brings up an important point:
AFAICS the paper doesn't even mention which mysql storage engine they're
using.  So it's *really* hard to tell what we're comparing to.

> Are all Postgresql indexes based on GIN & GiST?

No, certainly not.  See
http://www.postgresql.org/docs/8.4/static/indexes-types.html

            regards, tom lane

Re: Postgresql's table & index compared to that of MySQL

От
Craig Ringer
Дата:
On 17/08/10 06:59, Andy wrote:

> http://www.scribd.com/doc/2569473/Migrating-from-PostgreSQL-to-MySQL-at-Cocolog-Japans-Largest-Blog-Community
>
> Are there any reasons why table & index sizes of Postgresql should be larger than MySQL? Postgresql uses MVCC while
InnoDBdoes not use "full" MVCC, so perhaps that's a factor there. 

I can't help but wonder if they were doing regular VACUUM FULL
"maintenance" and being bitten by issues with index bloat as a
consequence. Though after reading the document linked, it's hard to
imagine they would be falling for a mistake that basic.

OTOH, a key clue there is:

  We don't use Auto VACUUM since we are worried about latent
  response time.

ie they wanted to avoid response latency spikes and thought/found they'd
be caused by autovacuum. Given that their discussion is of 7.4 to 8.1,
that's not unreasonable.

It looks to me like their *real* issue was just that they had problems
scaling a single PostgreSQL instance up to meet their needs, and were
having issues with vacuum under continuous high loads. They decided to
move to a sharded system, and moved over to MySQL at that time. That's
not necessarily a bad decision, as MySQL is more widely used in sharded
setups and there are some good tools to help with that sort of use.

It's pretty funny that they had issues with charset handling under Pg,
though, as most people seem to complain that Pg is too strict. Any ideas
what they might be talking about with:

  "PostgreSQL allow the out of boundary UTF-8 japanese extended
   character sets and multi bytes character sets which should
   normally come back with an error - instead of accepting them"

... and whether it's related to existing or past issues? Or might they
have just been using SQL_ASCII?

There are also some clues that the way they were doing things may not
have been ideal. For example, they mention as a "problem" with MySQL
that you get a "different sort order without order by clause". I
struggle to interpret that in any way but that they were relying on the
undefined ordering of records in a query without an ORDER BY, which is
just silly.

--
Craig Ringer

Re: Postgresql's table & index compared to that of MySQL

От
Thom Brown
Дата:
On 17 August 2010 04:05, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Andy <angelflow@yahoo.com> writes:
>> Your results of 867MB for Postgresql & 3,576 MB for InnoDB are surprising. Do you know why it is so much smaller for
Postgresql?Are there any indexes? 
>
> If I understood the original report correctly, they were complaining
> mostly about index size, so a table without indexes certainly isn't
> a real helpful comparison.

Yeah, I did attempt to create a full text GIN index on that last
night, but it was taking ages and it was getting late, so abandoned
it.  If you're interested, I set up one on MySQL's version (MyISAM of
course) and it was around 108 MB.  The problem is, if PostgreSQL's
index was, say, 600 MB, it might still not be fair to compare it since
they make not really be equivalent.

But those slides leave a lot of important information out.  And even
if it clearly explained everything in detail, they're talking about
7.4 and 8.0.  The world has changed since then.

--
Thom Brown
Registered Linux user: #516935

Re: Postgresql's table & index compared to that of MySQL

От
Thom Brown
Дата:
On 17 August 2010 13:45, Thom Brown <thom@linux.com> wrote:
> On 17 August 2010 04:05, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Andy <angelflow@yahoo.com> writes:
>>> Your results of 867MB for Postgresql & 3,576 MB for InnoDB are surprising. Do you know why it is so much smaller
forPostgresql? Are there any indexes? 
>>
>> If I understood the original report correctly, they were complaining
>> mostly about index size, so a table without indexes certainly isn't
>> a real helpful comparison.
>
> Yeah, I did attempt to create a full text GIN index on that last
> night, but it was taking ages and it was getting late, so abandoned
> it.  If you're interested, I set up one on MySQL's version (MyISAM of
> course) and it was around 108 MB.  The problem is, if PostgreSQL's
> index was, say, 600 MB, it might still not be fair to compare it since
> they make not really be equivalent.
>
> But those slides leave a lot of important information out.  And even
> if it clearly explained everything in detail, they're talking about
> 7.4 and 8.0.  The world has changed since then.
>
Okay, I've left the creation of 2 full text indexes, one using GIN and
another using GiST. GIN comes up with 72 MB and GiST 21 MB.

But again, this is all rather synthetic and the data I've used
contains duplicate content.  As for VACUUM performance hits, this has
changed since 8.0 too.  8.2 came with more efficient index VACUUMing.
8.3 introduced Heap-Only Tuples which allow dead tuples to be reused.
And VACUUM is also tunable in the config.

--
Thom Brown
Registered Linux user: #516935

Re: Postgresql's table & index compared to that of MySQL

От
Thom Brown
Дата:
On 17 August 2010 16:00, Thom Brown <thom@linux.com> wrote:
> On 17 August 2010 13:45, Thom Brown <thom@linux.com> wrote:
>> On 17 August 2010 04:05, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> Andy <angelflow@yahoo.com> writes:
>>>> Your results of 867MB for Postgresql & 3,576 MB for InnoDB are surprising. Do you know why it is so much smaller
forPostgresql? Are there any indexes? 
>>>
>>> If I understood the original report correctly, they were complaining
>>> mostly about index size, so a table without indexes certainly isn't
>>> a real helpful comparison.
>>
>> Yeah, I did attempt to create a full text GIN index on that last
>> night, but it was taking ages and it was getting late, so abandoned
>> it.  If you're interested, I set up one on MySQL's version (MyISAM of
>> course) and it was around 108 MB.  The problem is, if PostgreSQL's
>> index was, say, 600 MB, it might still not be fair to compare it since
>> they make not really be equivalent.
>>
>> But those slides leave a lot of important information out.  And even
>> if it clearly explained everything in detail, they're talking about
>> 7.4 and 8.0.  The world has changed since then.
>>
> Okay, I've left the creation of 2 full text indexes, one using GIN and
> another using GiST. GIN comes up with 72 MB and GiST 21 MB.
>
> But again, this is all rather synthetic and the data I've used
> contains duplicate content.  As for VACUUM performance hits, this has
> changed since 8.0 too.  8.2 came with more efficient index VACUUMing.
> 8.3 introduced Heap-Only Tuples which allow dead tuples to be reused.
> And VACUUM is also tunable in the config.
>

Actually, if MySQL won't return anything which occurs in 50% or more
of the rows, and all the rows in my test were duplicates, what's it
spending 108 MB on if there's no full text query I can use which can
return results?

--
Thom Brown
Registered Linux user: #516935