Обсуждение: Postgres for a "data warehouse", 5-10 TB
On Sun, Sep 11, 2011 at 6:35 AM, Igor Chudov <ichudov@gmail.com> wrote: > I have a server with about 18 TB of storage and 48 GB of RAM, and 12 > CPU cores. 1 or 2 fast cores is plenty for what you're doing. But the drive array and how it's configured etc are very important. There's a huge difference between 10 2TB 7200RPM SATA drives in a software RAID-5 and 36 500G 15kRPM SAS drives in a RAID-10 (SW or HW would both be ok for data warehouse.) > I do not know much about Postgres, but I am very eager to learn and > see if I can use it for my purposes more effectively than MySQL. > I cannot shell out $47,000 per CPU for Oracle for this project. > To be more specific, the batch queries that I would do, I hope, Hopefully if needs be you can spend some small percentage of that for a fast IO subsystem is needed. > would either use small JOINS of a small dataset to a large dataset, or > just SELECTS from one big table. > So... Can Postgres support a 5-10 TB database with the use pattern > stated above? I use it on a ~3TB DB and it works well enough. Fast IO is the key here. Lots of drives in RAID-10 or HW RAID-6 if you don't do a lot of random writing.
For 10 TB table and 3hours, disks should have a transfer about 1GB/s (seqscan). 2011/9/11, Scott Marlowe <scott.marlowe@gmail.com>: > On Sun, Sep 11, 2011 at 6:35 AM, Igor Chudov <ichudov@gmail.com> wrote: >> I have a server with about 18 TB of storage and 48 GB of RAM, and 12 >> CPU cores. > > 1 or 2 fast cores is plenty for what you're doing. But the drive > array and how it's configured etc are very important. There's a huge > difference between 10 2TB 7200RPM SATA drives in a software RAID-5 and > 36 500G 15kRPM SAS drives in a RAID-10 (SW or HW would both be ok for > data warehouse.) > >> I do not know much about Postgres, but I am very eager to learn and >> see if I can use it for my purposes more effectively than MySQL. >> I cannot shell out $47,000 per CPU for Oracle for this project. >> To be more specific, the batch queries that I would do, I hope, > > Hopefully if needs be you can spend some small percentage of that for > a fast IO subsystem is needed. > >> would either use small JOINS of a small dataset to a large dataset, or >> just SELECTS from one big table. >> So... Can Postgres support a 5-10 TB database with the use pattern >> stated above? > > I use it on a ~3TB DB and it works well enough. Fast IO is the key > here. Lots of drives in RAID-10 or HW RAID-6 if you don't do a lot of > random writing. > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance > -- ------------ pasman
On Sun, Sep 11, 2011 at 6:35 AM, Igor Chudov <ichudov@gmail.com> wrote:1 or 2 fast cores is plenty for what you're doing.
> I have a server with about 18 TB of storage and 48 GB of RAM, and 12
> CPU cores.
But the drive
array and how it's configured etc are very important. There's a huge
difference between 10 2TB 7200RPM SATA drives in a software RAID-5 and
36 500G 15kRPM SAS drives in a RAID-10 (SW or HW would both be ok for
data warehouse.)
> I do not know much about Postgres, but I am very eager to learn andHopefully if needs be you can spend some small percentage of that for
> see if I can use it for my purposes more effectively than MySQL.
> I cannot shell out $47,000 per CPU for Oracle for this project.
> To be more specific, the batch queries that I would do, I hope,
a fast IO subsystem is needed.
> would either use small JOINS of a small dataset to a large dataset, orI use it on a ~3TB DB and it works well enough. Fast IO is the key
> just SELECTS from one big table.
> So... Can Postgres support a 5-10 TB database with the use pattern
> stated above?
here. Lots of drives in RAID-10 or HW RAID-6 if you don't do a lot of
random writing.
For 10 TB table and 3hours, disks should have a transfer about 1GB/s (seqscan).
2011/9/11, Scott Marlowe <scott.marlowe@gmail.com>:> --> On Sun, Sep 11, 2011 at 6:35 AM, Igor Chudov <ichudov@gmail.com> wrote:
>> I have a server with about 18 TB of storage and 48 GB of RAM, and 12
>> CPU cores.
>
> 1 or 2 fast cores is plenty for what you're doing. But the drive
> array and how it's configured etc are very important. There's a huge
> difference between 10 2TB 7200RPM SATA drives in a software RAID-5 and
> 36 500G 15kRPM SAS drives in a RAID-10 (SW or HW would both be ok for
> data warehouse.)
>
>> I do not know much about Postgres, but I am very eager to learn and
>> see if I can use it for my purposes more effectively than MySQL.
>> I cannot shell out $47,000 per CPU for Oracle for this project.
>> To be more specific, the batch queries that I would do, I hope,
>
> Hopefully if needs be you can spend some small percentage of that for
> a fast IO subsystem is needed.
>
>> would either use small JOINS of a small dataset to a large dataset, or
>> just SELECTS from one big table.
>> So... Can Postgres support a 5-10 TB database with the use pattern
>> stated above?
>
> I use it on a ~3TB DB and it works well enough. Fast IO is the key
> here. Lots of drives in RAID-10 or HW RAID-6 if you don't do a lot of
> random writing.
>
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>
--
------------
pasman
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
On Sun, Sep 11, 2011 at 3:59 PM, Igor Chudov <ichudov@gmail.com> wrote: > Well, right now, my server has twelve 7,200 RPM 2TB hard drives in a RAID-6 > configuration. > They are managed by a 3WARE 9750 RAID CARD. > > I would say that I am not very concerned with linear relationship of read > speed to disk speed. If that stuff is somewhat slow, it is OK with me. With Raid 6 you'll have abysmal performance on write operations. In data warehousing, there's lots of writes to temporary files, for sorting and stuff like that. You should either migrate to raid 10, or set up a separate array for temporary files, perhaps raid 0.
On 09/11/2011 07:35 AM, Igor Chudov wrote: > I have been a MySQL user for years, including owning a few > multi-gigabyte databases for my websites, and using it to host > algebra.com <http://algebra.com> (about 12 GB database). > > I have had my ups and downs with MySQL. The ups were ease of use and > decent performance for small databases such as algebra.com <http://algebra.com>. The downs > were things like twenty hour REPAIR TABLE operations on a 35 GB > table, etc. > > Right now I have a personal (one user) project to create a 5-10 > Terabyte data warehouse. The largest table will consume the most space > and will take, perhaps, 200,000,000 rows. > > I want to use it to obtain valuable business intelligence and to make > money. > > I expect it to grow, never shrink, and to be accessed via batch > queries. I do not care for batch queries to be super fast, for example > an hour per query would be just fine. > > However, while an hour is fine, two weeks per query is NOT fine. > > I have a server with about 18 TB of storage and 48 GB of RAM, and 12 > CPU cores. > > My initial plan was to use MySQL, InnoDB, and deal with problems as > they arise. Perhaps, say, I would implement my own joining > procedures. > > After reading some disparaging stuff about InnoDB performance on large > datasets, however, I am getting cold feet. I have a general feeling > that, perhaps, I will not be able to succeed with MySQL, or, perhaps, > with either MySQL and Postgres. > > I do not know much about Postgres, but I am very eager to learn and > see if I can use it for my purposes more effectively than MySQL. > > I cannot shell out $47,000 per CPU for Oracle for this project. > > To be more specific, the batch queries that I would do, I hope, > would either use small JOINS of a small dataset to a large dataset, or > just SELECTS from one big table. > > So... Can Postgres support a 5-10 TB database with the use pattern > stated above? > > Thanks! > > i > That is a scale or two larger than I have experience with. I converted my website database from mysql to PG, and it hasseveral db's between 1 and 10 gig. There are parts of the website that were faster with mysql, and there are parts fasterwith PG. One spot, because PG has superior join support on select statements, I was able to change the code to generatea single more complicated sql statement vs. mysql that had to fire off several simpler statements. Its a searchscreen where you can type in 15'ish different options. I was able to generate a single sql statement which joins 8some odd tables and plenty of where statements. PG runs it in the blink of an eye. Its astonishing compared to the painof mysql. If you ever have to write your own join, or your own lookup function, that's a failure of your database. One spot that was slower was a batch insert of data. Its not so much slower that it was a problem. I use COPY on PG vsprepared insert's on mysql. It was pretty close, but mysql still won. Seeing as you can setup and test both databases, have you considered a trial run? Things to watch for: I think the same amount of data will use more disk space in PG than in mysql. Importing data into PG should use COPY and multiple connections at the same time. PG will only use multi-core if you use multiple connections. (each connecion uses one core). Huge result sets (like a select statement that returns 1,000,000 rows) will be slow. PG is a much fuller database than mysql, and as such you can influence its join types, and function calls. (table scan vsindex, immutable function vs stable, perl function vs sql). So if at first it appears slow, you have a million options. I think the only option you have in mysql is to pull the data back and code it yourself. Upgrading to major versions of PG may or may not be painful. (mysql sometimes works seamlessly between versions, it appearsbrilliant. But I have had problems with an update, and when it goes bad, you dont have a lot of options). In thepast PG's only method of upgrade was a full backup of old, restore in new. Things have gotten better, there is new pg_upgradesupport (still kinda new though), and there is some 3rd party replication support where you replicate your 9.0database to a new 9.1 database, and at some point you promote the new 9.1 database as the new master. Or something likethat. I've only read posts about it, never done it. But with that much data, you'll need an upgrade plan. All in all, if I can summarize my personal view: mysql is fast at the expense of safety and usability. (mysql still cannotdo update's with subselects). PG is safe and usable at the expense of speed, and you wont be disappointed by the speed. -Andy
On Sun, Sep 11, 2011 at 3:59 PM, Igor Chudov <ichudov@gmail.com> wrote:With Raid 6 you'll have abysmal performance on write operations.
> Well, right now, my server has twelve 7,200 RPM 2TB hard drives in a RAID-6
> configuration.
> They are managed by a 3WARE 9750 RAID CARD.
>
> I would say that I am not very concerned with linear relationship of read
> speed to disk speed. If that stuff is somewhat slow, it is OK with me.
In data warehousing, there's lots of writes to temporary files, for
sorting and stuff like that.
You should either migrate to raid 10, or set up a separate array for
temporary files, perhaps raid 0.
On 09/11/2011 08:59 AM, Igor Chudov wrote: > > > I do not plan to do a lot of random writing. My current design is that my perl scripts write to a temporary table everyweek, and then I do INSERT..ON DUPLICATE KEY UPDATE. > > By the way, does that INSERT UPDATE functionality or something like this exist in Postgres? > > i You have two options: 1) write a function like: create function doinsert(_id integer, _value text) returns void as $$ begin update thetable set value = _value where id = _id; if not found then insert into thetable(id, value) values (_id, _value); end if end; $$ language plpgsql; 2) use two sql statements: -- update the existing update realTable set value = (select value from tmp where tmp.id = realTable.id) where exists (select value from tmp where tmp.id = realTable.id); -- insert the missing insert into realTable(id, value) select id, value from tmp where not exists(select 1 from realTable where tmp.id = realTable.id); -Andy
On Sun, Sep 11, 2011 at 4:16 PM, Andy Colson <andy@squeakycode.net> wrote: > Upgrading to major versions of PG may or may not be painful. (mysql > sometimes works seamlessly between versions, it appears brilliant. But I > have had problems with an update, and when it goes bad, you dont have a lot > of options). In the past PG's only method of upgrade was a full backup of > old, restore in new. Things have gotten better, there is new pg_upgrade > support (still kinda new though), and there is some 3rd party replication > support where you replicate your 9.0 database to a new 9.1 database, and at > some point you promote the new 9.1 database as the new master. Or something > like that. I've only read posts about it, never done it. But with that > much data, you'll need an upgrade plan. I have used slony to do database migration. It is a pain to set up, but it saves you hours of downtime. Basically, you replicate your 9.0 database into a 9.1 slave while the 9.0 is still hot and working, so you only have a very small downtime. It's an option, but it's a lot of work to set up, only warranted if you really cannot afford the downtime.
On 09/11/2011 09:21 AM, Igor Chudov wrote: > > > On Sun, Sep 11, 2011 at 9:16 AM, Claudio Freire <klaussfreire@gmail.com <mailto:klaussfreire@gmail.com>> wrote: > > On Sun, Sep 11, 2011 at 3:59 PM, Igor Chudov <ichudov@gmail.com <mailto:ichudov@gmail.com>> wrote: > > Well, right now, my server has twelve 7,200 RPM 2TB hard drives in a RAID-6 > > configuration. > > They are managed by a 3WARE 9750 RAID CARD. > > > > I would say that I am not very concerned with linear relationship of read > > speed to disk speed. If that stuff is somewhat slow, it is OK with me. > > With Raid 6 you'll have abysmal performance on write operations. > In data warehousing, there's lots of writes to temporary files, for > sorting and stuff like that. > > You should either migrate to raid 10, or set up a separate array for > temporary files, perhaps raid 0. > > > Thanks. I will rebuild the RAID array early next week and I will see if I have a Raid 10 option with that card. > > Quantitatively, what would you say is the write speed difference between RAID 10 and RAID 6? > Note that using RAID 10, while faster, cuts your usable space in half. 12 2TB drives in raid 10 == 6 drives * 2TB == 12 TBtotal space. That's not big enough, is it? -Andy
On Sun, Sep 11, 2011 at 4:21 PM, Igor Chudov <ichudov@gmail.com> wrote: > Quantitatively, what would you say is the write speed difference between > RAID 10 and RAID 6? https://support.nstein.com/blog/archives/73 There you can see a comparison with 4 drives, and raid 10 is twice as fast. Since raid 5/6 doesn't scale write performance at all (it performs as a single drive), it's quite expected. 12 drives would probably be around 6 times as fast as raid 6. You definitely should do some benchmarks to confirm, though. And Andy is right, you'll have a lot less space. If raid 10 doesn't give you enough room, just leave two spare drives for a raid 0 temporary partition. That will be at least twice as fast as doing temporary tables on the raid 6. You'll obviously have to get creative, tons of options.
On Sun, Sep 11, 2011 at 17:23, Andy Colson <andy@squeakycode.net> wrote: > On 09/11/2011 08:59 AM, Igor Chudov wrote: >> By the way, does that INSERT UPDATE functionality or something like this exist in Postgres? > You have two options: > 1) write a function like: > create function doinsert(_id integer, _value text) returns void as > 2) use two sql statements: Unfortunately both of these options have caveats. Depending on your I/O speed, you might need to use multiple loader threads to saturate the write bandwidth. However, neither option is safe from race conditions. If you need to load data from multiple threads at the same time, they won't see each other's inserts (until commit) and thus cause unique violations. If you could somehow partition their operation by some key, so threads are guaranteed not to conflict each other, then that would be perfect. The 2nd option given by Andy is probably faster. You *could* code a race-condition-safe function, but that would be a no-go on a data warehouse, since each call needs a separate subtransaction which involves allocating a transaction ID. ---- Which brings me to another important point: don't do lots of small write transactions, SAVEPOINTs or PL/pgSQL subtransactions. Besides being inefficient, they introduce a big maintenance burden. In PostgreSQL's MVCC, each tuple contains a reference to the 32-bit transaction ID that inserted it (xmin). After hitting the maximum 32-bit value transaction ID, the number "wraps around". To prevent old rows from appearing as new, a "vacuum freeze" process will run after passing autovacuum_freeze_max_age transactions (200 million by default) to update all old rows in your database. Using fewer transaction IDs means it runs less often. On small databases, this is usually not important. But on a 10TB data warehouse, rewriting a large part of your database totally kills performance for any other processes. This is detailed in the documentation: http://www.postgresql.org/docs/9.1/static/routine-vacuuming.html#VACUUM-FOR-WRAPAROUND Regards, Marti
On Sun, Sep 11, 2011 at 9:16 AM, Claudio Freire <klaussfreire@gmail.com> wrote:On Sun, Sep 11, 2011 at 3:59 PM, Igor Chudov <ichudov@gmail.com> wrote:With Raid 6 you'll have abysmal performance on write operations.
> Well, right now, my server has twelve 7,200 RPM 2TB hard drives in a RAID-6
> configuration.
> They are managed by a 3WARE 9750 RAID CARD.
>
> I would say that I am not very concerned with linear relationship of read
> speed to disk speed. If that stuff is somewhat slow, it is OK with me.
In data warehousing, there's lots of writes to temporary files, for
sorting and stuff like that.
You should either migrate to raid 10, or set up a separate array for
temporary files, perhaps raid 0.Thanks. I will rebuild the RAID array early next week and I will see if I have a Raid 10 option with that card.Quantitatively, what would you say is the write speed difference between RAID 10 and RAID 6?
As someone who migrated a RAID 5 installation to RAID 10, I am getting far better read and write performance on heavy calculation queries. Writing on the RAID 5 really made things crawl. For lots of writing, I think RAID 10 is the best. It should also be noted that I changed my filesystem from ext3 to XFS - this is something you can look into as well.Ogden
But yeah, we tried quite a few file systems, and XFS **for our workloads** performed better than everything else we tested, and RAID 10 is a given if you do any significant writing.
For really big data-warehousing, this document really helped us:
http://pgexperts.com/document.html?id=49
2011/9/11 pasman pasmański <pasman.p@gmail.com>: > For 10 TB table and 3hours, disks should have a transfer about 1GB/s (seqscan). Random data point. Our primary servers were built for OLTP with 48 cores and 32 15kSAS drives. We started out on Arecas but the Supermicro 1Us we were using didn't provide enough cooling and the Arecas were burning out after 2 to 4 months, so on those machines, we pulled the Arecas and replaced them with simple LSI SAS non-RAID cards. Both were RAID-10, the latter with linux software RAID. With the Arecas the OLTP performance is outstanding, garnering us ~8500tps at 40 to 50 threads. However, sequentual performance was just so so at around read / write speeds of 500/350MB/s. The SW RAID-10 can read AND write at right around 1GB/s. what it lacks in transactional throughput it more than makes up for in sequential read / write performance. Another data point. We had a big Oracle installation at my last job, and OLAP queries were killing it midday, so I built a simple replication system to grab rows from the big iron Oracle SUN box and shove into a single core P IV 2.xGHz machine with 4 120G SATA drives in SW RAID-10. That machine handily beat the big iron Oracle machine at OLAP queries, running in 20 minutes what was taking well over an hour for the big Oracle machine to do, even during its (Oracle machine) off peak load times.
* Igor Chudov (ichudov@gmail.com) wrote: > Right now I have a personal (one user) project to create a 5-10 > Terabyte data warehouse. The largest table will consume the most space > and will take, perhaps, 200,000,000 rows. I run data-warehouse databases on that order (current largest single instance is ~4TB running under 9.0.4). If the largest table is only 200M rows, PG should handle that quite well. Our data is partitioned by month and each month is about 200M records and simple queries can run in 15-20 minutes (with a single thread), with complex windowing queries (split up and run in parallel) finishing in a couple of hours. > However, while an hour is fine, two weeks per query is NOT fine. What's really, really, really useful are two things: EXPLAIN, and this mailing list. :) Seriously, run EXPLAIN on your queries before you run them and see if how the query is going to be executed makes sense. Here's a real easy hint: if it says "External Sort" and has big numbers, come talk to us here- that's about one of the worst things you can possibly do. Of course, PG's going to avoid doing that, but you may have written a query (unintentionally) which forces PG to do a sort, or something else. > I have a server with about 18 TB of storage and 48 GB of RAM, and 12 > CPU cores. If you partition up your data and don't mind things running in different transactions, you can definitely get a speed boost with PG by running things in parallel. PG will handle that very well, in fact, if two queries are running against the same table, PG will actually combine them and only actually read the data from disk once. > I cannot shell out $47,000 per CPU for Oracle for this project. The above data warehouse was migrated from an Oracle-based system. :) > To be more specific, the batch queries that I would do, I hope, > would either use small JOINS of a small dataset to a large dataset, or > just SELECTS from one big table. Make sure that you set your 'work_mem' correctly- PG will use that to figure out if it can hash the small table (you want that to happen, trust me..). If you do end up having sorts, it'll also use the work_mem value to figure out how much memory to use for sorting. > So... Can Postgres support a 5-10 TB database with the use pattern > stated above? Yes, certainly. Thanks, Stephen
Вложения
* Igor Chudov (ichudov@gmail.com) wrote:> Right now I have a personal (one user) project to create a 5-10I run data-warehouse databases on that order (current largest single
> Terabyte data warehouse. The largest table will consume the most space
> and will take, perhaps, 200,000,000 rows.
instance is ~4TB running under 9.0.4). If the largest table is only
200M rows, PG should handle that quite well. Our data is partitioned by
month and each month is about 200M records and simple queries can run in
15-20 minutes (with a single thread), with complex windowing queries
(split up and run in parallel) finishing in a couple of hours.
> However, while an hour is fine, two weeks per query is NOT fine.What's really, really, really useful are two things: EXPLAIN, and this
mailing list. :) Seriously, run EXPLAIN on your queries before you run
them and see if how the query is going to be executed makes sense.
Here's a real easy hint: if it says "External Sort" and has big numbers,
come talk to us here- that's about one of the worst things you can
possibly do. Of course, PG's going to avoid doing that, but you may
have written a query (unintentionally) which forces PG to do a sort, or
something else.
> I have a server with about 18 TB of storage and 48 GB of RAM, and 12If you partition up your data and don't mind things running in different
> CPU cores.
transactions, you can definitely get a speed boost with PG by running
things in parallel. PG will handle that very well, in fact, if two
queries are running against the same table, PG will actually combine
them and only actually read the data from disk once.The above data warehouse was migrated from an Oracle-based system. :)
> I cannot shell out $47,000 per CPU for Oracle for this project.
> To be more specific, the batch queries that I would do, I hope,Make sure that you set your 'work_mem' correctly- PG will use that to
> would either use small JOINS of a small dataset to a large dataset, or
> just SELECTS from one big table.
figure out if it can hash the small table (you want that to happen,
trust me..). If you do end up having sorts, it'll also use the work_mem
value to figure out how much memory to use for sorting.
> So... Can Postgres support a 5-10 TB database with the use patternYes, certainly.
> stated above?
Thanks,
Stephen
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
iEYEARECAAYFAk5tPc8ACgkQrzgMPqB3kigtSgCffwEmi3AD6Ryff7qZyQYieyKQ
jhoAoJDFC1snQmwCIBUjwlC6WVRyAOkn
=LPtP
-----END PGP SIGNATURE-----
On Mon, Sep 12, 2011 at 1:16 AM, Igor Chudov <ichudov@gmail.com> wrote: > I could, say, set work_mem to 30 GB? (64 bit linux) I don't think you'd want that. Remember, work_mem is the amount of memory *per sort*. Queries can request several times that much memory, once per sort they need to perform. You can set it really high, but not 60% of your RAM - that wouldn't be wise.
* Igor Chudov (ichudov@gmail.com) wrote: > Can I partition data by month (or quarter), without that month being part of > PRIMARY KEY? The way partitioning works in PG is by using CHECK constraints. Not sure if you're familiar with those (not sure if MySQL has them), so here's a quick example: Create a parent table. Then create two tables which inherit from that parent table (this is more of an implementation detail than anything else, the parent table is always empty, it's just there to be the single, combined, table that you run your select queries against). On each of the two 'child' tables, create a CHECK constraint. On table1, you do: alter table table1 add check (date < '2000-01-01'); On table2, you do: alter table table2 add check (date >= '2000-01-01'); Once those are done, you can query against the 'parent' table with something like: select * from parent where date = '2010-01-01'; And PG will realize it only has to look at table2 to get the results for that query. This means the partitioning can be more-or-less any check constraint that will be satisfied by the data in the table (and PG will check/enforce this) and that PG can figure out will eliminate a partition from possibly having the data that matches the request. Technically, this means that you could have all kinds of different ways your data is split across the partitions, but remember that all the constraints have to actually be TRUE. :) Typically, people do split based on the PK, but it's not required (realize that PG doesn't support cross-table PKs, so if you don't have CHECK constraints which make sure that the tables don't cover the same PK value, you could end up with duplicate values across the tables...). > If this question sounds weird, I am asking because MySQL enforces this, > which does not fit my data. That part is a little strange.. > If I can keep my primary key to be the ID that I want (which comes with > data), but still partition it by month, I will be EXTREMELY happy. As I said above, the actual PK is going to be independent and in the base/child tables. That said, yes, you could have the PK in each table be whatever you want and you use month to partition the 'main' table. You then have to come up with some other way to make sure your PK is enforced, however, or figure out a way to deal with things if it's not. Based on what you've been describing, I'm afraid you'd have to actually search all the partitions for a given ID on an update, to figure out if you're doing an UPDATE or an INSERT... Unless, of course, the month is included in the PK somewhere, or is in the incoming data and you can be 100% confident that the incoming data is never wrong.. :) > I am wondering, why? Cost, and we had a real hard time (this was a while ago..) getting Oracle to run decently on Linux, and the Sun gear was just too damn expensive. Also, ease of maintenance- it takes a LOT less effort to keep a PG database set up and running smoothly than an Oracle one, imv. > I could, say, set work_mem to 30 GB? (64 bit linux) You can, but you have to be careful with it, because PG will think it can use 30GB for EACH sort in a given query, and in EACH hash in a given query. What I would recommend is setting the default to something like 256MB and then looking at specific queries and bumping it up for those queries when it's clear that it'll help the query and won't cause the system to go into swap. Note that you can set work_mem for a given session after you connect to the database, just do: set work_mem = '1GB'; in your session before running other queries. Doing that won't impact other sessions. Thanks, Stephen
Вложения
* Claudio Freire (klaussfreire@gmail.com) wrote: > I don't think you'd want that. Remember, work_mem is the amount of > memory *per sort*. > Queries can request several times that much memory, once per sort they > need to perform. > > You can set it really high, but not 60% of your RAM - that wouldn't be wise. Oh, I dunno.. It's only used by the planner, so sometimes you have to bump it up, especially when PG thinks the number of rows returned from something will be a lot more than it really will be. :) /me has certain queries where it's been set to 100GB... ;) I agree that it shouldn't be the default, however. That's asking for trouble. Do it for the specific queries that need it. Thanks, Stephen
Вложения
Hi, On 12 September 2011 12:28, Stephen Frost <sfrost@snowman.net> wrote: > Once those are done, you can query against the 'parent' table with > something like: > select * from parent where date = '2010-01-01'; > > And PG will realize it only has to look at table2 to get the results for > that query. This means the partitioning can be more-or-less any check > constraint that will be satisfied by the data in the table (and PG will > check/enforce this) and that PG can figure out will eliminate a partition > from possibly having the data that matches the request. Theory is nice but there are few gotchas (in 8.4) : - planner can use constant expressions only. You will get scans across all partitions when you use function (like now(), immutable function with constant arguments), sub query (like part_col = (select x from ...) .. ) or anything which can't be evaluated to constat during query planning. - partitions constraints are not "pushed to joins" (assuming tables partitioned by primary key): select ... from X left join Y on X.primary_key = Y.primary_key where part_col >= ... and X.primary_key >= .,, and X.primary_key < ... must be rewritten like select ... from X left join Y on X.primary_key = Y.primary_key and X.primary_key >= .,, and Y.primary_key < ... where X.primary_key >= .,, and X.primary_key < ... in order to avoid scan entire Y table (not only relevant partitions) - ORDER BY / LIMIT X issue fixed in 9.1 (Allow inheritance table scans to return meaningfully-sorted results. Moreover all queries should have 'WHERE' on column which is used for partitioning otherwise partitioning is not very useful (yes, it could simplify data management -- drop partition vs delete from X where part_col between A and B) -- Ondrej Ivanic (ondrej.ivanic@gmail.com)
On 09/11/2011 09:44 AM, Claudio Freire wrote: > And Andy is right, you'll have a lot less space. If raid 10 doesn't > give you enough room, just leave two spare drives for a raid 0 > temporary partition. That will be at least twice as fast as doing > temporary tables on the raid 6. Alternatively, throw a lot of memory at the system and point the temp space at /dev/shm. We've had really good luck doing that here, to avoid excessive writes to our NVRAM PCIe cards. Make sure the transaction logs (and any archives) get written to a separate LUN (ideally on a separate controller) for even more win. -- Shaun Thomas OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604 312-676-8870 sthomas@peak6.com ______________________________________________ See http://www.peak6.com/email-disclaimer/ for terms and conditions related to this email
On 09/11/2011 12:02 PM, Marti Raudsepp wrote: > Which brings me to another important point: don't do lots of small > write transactions, SAVEPOINTs or PL/pgSQL subtransactions. Besides > being inefficient, they introduce a big maintenance burden. I'd like to second this. Before a notable application overhaul, we were handling about 300-million transactions per day (250M of that was over a 6-hour period). To avoid the risk of mid-day vacuum-freeze, we disabled autovacuum and run a nightly vacuum over the entire database. And that was *after* bumping autovacuum_freeze_max_age to 600-million. You do *not* want to screw with that if you don't have to, and a setting of 600M is about 1/3 of the reasonable boundary there. If not for the forced autovacuums, a database with this much traffic would be corrupt in less than a week. We've managed to cut that transaction traffic by 60%, and it greatly improved the database's overall health. -- Shaun Thomas OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604 312-676-8870 sthomas@peak6.com ______________________________________________ See http://www.peak6.com/email-disclaimer/ for terms and conditions related to this email
On 09/11/2011 09:27 AM, Claudio Freire wrote: > I have used slony to do database migration. It is a pain to set up, > but it saves you hours of downtime. I've had to shoot this option down in two separate upgrade scenarios in two different companies. Theoretically it's possible, but slony is based on triggers. If you have an OLTP database with frequent writes, that overhead (firing the trigger, storing the replicated data, reading the replication log, traffic to the upgrade node) can literally kill your application. Downtime is one thing, but maintenance windows can be planned. Ruining application performance for an undetermined length of time is probably worse. Thankfully 8.4 added pg_migrator/pg_upgrade, so that kind of pain is probably over for the most part. But even without it, 8.4 and above have parallel restore, which can drop upgrade times down to a fraction of their former length. -- Shaun Thomas OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604 312-676-8870 sthomas@peak6.com ______________________________________________ See http://www.peak6.com/email-disclaimer/ for terms and conditions related to this email
On 11.09.2011 22:10, Scott Marlowe wrote: > Another data point. We had a big Oracle installation at my last job, > and OLAP queries were killing it midday, so I built a simple > replication system to grab rows from the big iron Oracle SUN box and > shove into a single core P IV 2.xGHz machine with 4 120G SATA drives > in SW RAID-10. > > That machine handily beat the big iron Oracle machine at OLAP queries, > running in 20 minutes what was taking well over an hour for the big > Oracle machine to do, even during its (Oracle machine) off peak load > times. Um, that sounds as if the SUN setup was really bad. Do you remember any details about the drive configuration there? Kind regards robert
On 11.09.2011 19:02, Marti Raudsepp wrote: > On Sun, Sep 11, 2011 at 17:23, Andy Colson<andy@squeakycode.net> wrote: >> On 09/11/2011 08:59 AM, Igor Chudov wrote: >>> By the way, does that INSERT UPDATE functionality or something like this exist in Postgres? >> You have two options: >> 1) write a function like: >> create function doinsert(_id integer, _value text) returns void as >> 2) use two sql statements: > > Unfortunately both of these options have caveats. Depending on your > I/O speed, you might need to use multiple loader threads to saturate > the write bandwidth. > > However, neither option is safe from race conditions. If you need to > load data from multiple threads at the same time, they won't see each > other's inserts (until commit) and thus cause unique violations. If > you could somehow partition their operation by some key, so threads > are guaranteed not to conflict each other, then that would be perfect. > The 2nd option given by Andy is probably faster. > > You *could* code a race-condition-safe function, but that would be a > no-go on a data warehouse, since each call needs a separate > subtransaction which involves allocating a transaction ID. Wouldn't it be sufficient to reverse order for race condition safety? Pseudo code: begin insert ... catch update ... if not found error end Speed is another matter though... Kind regards robert
On 9/12/2011 12:15 PM, Robert Klemme wrote: > On 11.09.2011 19:02, Marti Raudsepp wrote: >> On Sun, Sep 11, 2011 at 17:23, Andy Colson<andy@squeakycode.net> wrote: >>> On 09/11/2011 08:59 AM, Igor Chudov wrote: >>>> By the way, does that INSERT UPDATE functionality or something like >>>> this exist in Postgres? >>> You have two options: >>> 1) write a function like: >>> create function doinsert(_id integer, _value text) returns void as >>> 2) use two sql statements: >> >> Unfortunately both of these options have caveats. Depending on your >> I/O speed, you might need to use multiple loader threads to saturate >> the write bandwidth. >> >> However, neither option is safe from race conditions. If you need to >> load data from multiple threads at the same time, they won't see each >> other's inserts (until commit) and thus cause unique violations. If >> you could somehow partition their operation by some key, so threads >> are guaranteed not to conflict each other, then that would be perfect. >> The 2nd option given by Andy is probably faster. >> >> You *could* code a race-condition-safe function, but that would be a >> no-go on a data warehouse, since each call needs a separate >> subtransaction which involves allocating a transaction ID. > > Wouldn't it be sufficient to reverse order for race condition safety? > Pseudo code: > > begin > insert ... > catch > update ... > if not found error > end > > Speed is another matter though... > > Kind regards > > robert > > > No, I dont think so, if you had two loaders, both would start a transaction, then neither could see what the other was doing. There are transaction isolation levels, but they are like playing with fire. (in my opinion). -Andy
On Mon, Sep 12, 2011 at 11:04 AM, Robert Klemme <shortcutter@googlemail.com> wrote: > On 11.09.2011 22:10, Scott Marlowe wrote: > >> Another data point. We had a big Oracle installation at my last job, >> and OLAP queries were killing it midday, so I built a simple >> replication system to grab rows from the big iron Oracle SUN box and >> shove into a single core P IV 2.xGHz machine with 4 120G SATA drives >> in SW RAID-10. >> >> That machine handily beat the big iron Oracle machine at OLAP queries, >> running in 20 minutes what was taking well over an hour for the big >> Oracle machine to do, even during its (Oracle machine) off peak load >> times. > > Um, that sounds as if the SUN setup was really bad. Do you remember any > details about the drive configuration there? It was actually setup quite well. A very fast SAN with individual drive arrays etc. It was VERY fast at transactional throughput. BUT it was not setup for massive OLAP work. The drives that housed the statistical data we were running OLAP against were the slowest in the set, since they were made to mostly just take in a small amount of data each minute from the java servers. Originally the stats had been on a pg server in production and very fast, but some political decision moved it onto the Oracle server. The Oracle DBA wasn't any happier with this move than me, btw.
On Mon, Sep 12, 2011 at 10:22 AM, Shaun Thomas <sthomas@peak6.com> wrote: > On 09/11/2011 12:02 PM, Marti Raudsepp wrote: > >> Which brings me to another important point: don't do lots of small >> write transactions, SAVEPOINTs or PL/pgSQL subtransactions. Besides >> being inefficient, they introduce a big maintenance burden. > > I'd like to second this. Before a notable application overhaul, we were > handling about 300-million transactions per day (250M of that was over a > 6-hour period). To avoid the risk of mid-day vacuum-freeze, we disabled > autovacuum and run a nightly vacuum over the entire database. And that was > *after* bumping autovacuum_freeze_max_age to 600-million. > > You do *not* want to screw with that if you don't have to, and a setting of > 600M is about 1/3 of the reasonable boundary there. If not for the forced > autovacuums, a database with this much traffic would be corrupt in less than > a week. We've managed to cut that transaction traffic by 60%, and it greatly > improved the database's overall health. I put it to you that your hardware has problems if you have a pg db that's corrupting from having too much vacuum activity. I've had exactly one pg corruption problem in the past, and it was a bad SATA hard drive on a stats server. I have four 48 core opterons running quite hard during the day, have autovacuum on and VERY aggresively tuned and have had zero corruption issues in over 3 years of hard running.
On 09/12/2011 02:48 PM, Scott Marlowe wrote: > I put it to you that your hardware has problems if you have a pg db > that's corrupting from having too much vacuum activity. What? No. We optimized by basically forcing autovacuum to never run during our active periods. We never actually encountered wrap-around corruption. I was just saying that 600M is a relatively high setting for autovacuum_freeze_max_age. :) I was alluding to the fact that if a DBA had his system running for a week at our transaction level, and PG didn't have forced auto vacuum, and their maintenance lapsed even slightly, they could end up with a corrupt database. Not too far-fetched for someone coming from MySQL, really. Our problem is we run a financial site, and the front-end very aggressively monitors network and database timeouts. The limit is sufficiently low that a vacuum would cause enough IO to trigger application timeouts, even with vacuum_cost_delay. And of course, setting vacuum_cost_delay too high quickly triples or quadruples vacuum times. Now that we're using FusionIO cards, I've been thinking about turning autovacuum back on, but I want to run some tests first. My point stands, though. Don't go crazy with transactions until you know your config can stand up to it, and reduce if possible. We found some tweak points that drastically reduced transaction count with no detrimental effect on the app itself, so we jumped on them. -- Shaun Thomas OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604 312-676-8870 sthomas@peak6.com ______________________________________________ See http://www.peak6.com/email-disclaimer/ for terms and conditions related to this email
On Mon, Sep 12, 2011 at 23:04, Shaun Thomas <sthomas@peak6.com> wrote: > I was alluding to the fact that if a DBA had his system running for a week > at our transaction level, and PG didn't have forced auto vacuum, and their > maintenance lapsed even slightly, they could end up with a corrupt database. It doesn't actually corrupt your database. If you manage to hit the wraparound age, PostgreSQL disallows new connections and tells you to run a VACUUM from a standalone backend. (But that should never happen due to the forced vacuum freeze processes) Regards, Marti
On Mon, Sep 12, 2011 at 2:04 PM, Shaun Thomas <sthomas@peak6.com> wrote: > On 09/12/2011 02:48 PM, Scott Marlowe wrote: > >> I put it to you that your hardware has problems if you have a pg db >> that's corrupting from having too much vacuum activity. > > What? No. We optimized by basically forcing autovacuum to never run during > our active periods. We never actually encountered wrap-around corruption. I > was just saying that 600M is a relatively high setting for > autovacuum_freeze_max_age. :) You don't get corruption from wrap around, you get a database that stops and tells you to run a vacuum by hand on a single user backend and won't come up until you do. You throw around the word corruption a lot. The PostgreSQL team works REALLY hard to prevent any kind of corruption scenario from rearing its ugly head, so when the word corruption pops up I start to wonder about the system (hardware wise) someone is using, since only killing the postmaster by hand, then deleting the interlock file and starting a new postmaster while old postgres children are still active is just about the only way to corrupt pgsql, short of using vi on one of the files in /data/base/xxx/yyy etc. > > I was alluding to the fact that if a DBA had his system running for a week > at our transaction level, and PG didn't have forced auto vacuum, and their > maintenance lapsed even slightly, they could end up with a corrupt database. > Not too far-fetched for someone coming from MySQL, really. > > Our problem is we run a financial site, and the front-end very aggressively > monitors network and database timeouts. The limit is sufficiently low that a > vacuum would cause enough IO to trigger application timeouts, even with > vacuum_cost_delay. And of course, setting vacuum_cost_delay too high quickly > triples or quadruples vacuum times. Now that we're using FusionIO cards, > I've been thinking about turning autovacuum back on, but I want to run some > tests first. > > My point stands, though. Don't go crazy with transactions until you know > your config can stand up to it, and reduce if possible. We found some tweak > points that drastically reduced transaction count with no detrimental effect > on the app itself, so we jumped on them. > > -- > Shaun Thomas > OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604 > 312-676-8870 > sthomas@peak6.com > > ______________________________________________ > > See http://www.peak6.com/email-disclaimer/ for terms and conditions related > to this email > -- To understand recursion, one must first understand recursion.
On 09/12/2011 03:44 PM, Scott Marlowe wrote: > The PostgreSQL team works REALLY hard to prevent any kind of > corruption scenario from rearing its ugly head, so when the word > corruption pops up I start to wonder about the system (hardware > wise) someone is using, You've apparently never used early versions of EnterpriseDB. ;) Kidding aside, it's apparently been a while since I read that particular part of the manual. The error I *was* familiar with was from the 8.0 manual: "WARNING: some databases have not been vacuumed in 1613770184 transactions HINT: Better vacuum them within 533713463 transactions, or you may have a wraparound failure." Ever since the early days, I've been so paranoid about regular vacuuming, I'm probably still a little overcautious. So, my bad. Having a database down for a few hours isn't exactly desirable, but it's certainly not corruption. :) -- Shaun Thomas OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604 312-676-8870 sthomas@peak6.com ______________________________________________ See http://www.peak6.com/email-disclaimer/ for terms and conditions related to this email
On Mon, Sep 12, 2011 at 2:55 PM, Shaun Thomas <sthomas@peak6.com> wrote: > On 09/12/2011 03:44 PM, Scott Marlowe wrote: > >> The PostgreSQL team works REALLY hard to prevent any kind of >> corruption scenario from rearing its ugly head, so when the word >> corruption pops up I start to wonder about the system (hardware >> wise) someone is using, > > > You've apparently never used early versions of EnterpriseDB. ;) > > Kidding aside, it's apparently been a while since I read that particular > part of the manual. The error I *was* familiar with was from the 8.0 manual: > > "WARNING: some databases have not been vacuumed in 1613770184 transactions > HINT: Better vacuum them within 533713463 transactions, or you may have a > wraparound failure." > > Ever since the early days, I've been so paranoid about regular vacuuming, > I'm probably still a little overcautious. > > So, my bad. Having a database down for a few hours isn't exactly desirable, > but it's certainly not corruption. :) No biggie, more a question of semantics. Just a trigger word for me. I started with pgsql 6.5.2 so I know ALL ABOUT corruption. hehe.
On 12.09.2011 19:22, Andy Colson wrote: > On 9/12/2011 12:15 PM, Robert Klemme wrote: >> On 11.09.2011 19:02, Marti Raudsepp wrote: >>> On Sun, Sep 11, 2011 at 17:23, Andy Colson<andy@squeakycode.net> wrote: >>>> On 09/11/2011 08:59 AM, Igor Chudov wrote: >>>>> By the way, does that INSERT UPDATE functionality or something like >>>>> this exist in Postgres? >>>> You have two options: >>>> 1) write a function like: >>>> create function doinsert(_id integer, _value text) returns void as >>>> 2) use two sql statements: >>> >>> Unfortunately both of these options have caveats. Depending on your >>> I/O speed, you might need to use multiple loader threads to saturate >>> the write bandwidth. >>> >>> However, neither option is safe from race conditions. If you need to >>> load data from multiple threads at the same time, they won't see each >>> other's inserts (until commit) and thus cause unique violations. If >>> you could somehow partition their operation by some key, so threads >>> are guaranteed not to conflict each other, then that would be perfect. >>> The 2nd option given by Andy is probably faster. >>> >>> You *could* code a race-condition-safe function, but that would be a >>> no-go on a data warehouse, since each call needs a separate >>> subtransaction which involves allocating a transaction ID. >> >> Wouldn't it be sufficient to reverse order for race condition safety? >> Pseudo code: >> >> begin >> insert ... >> catch >> update ... >> if not found error >> end >> >> Speed is another matter though... > No, I dont think so, if you had two loaders, both would start a > transaction, then neither could see what the other was doing. It depends. But the point is that not both INSERTS can succeed. The one which fails will attempt the UPDATE and - depending on isolation level and DB implementation - will be blocked or fail. In the case of PG this particular example will work: 1. TX inserts new PK row 2. TX tries to insert same PK row => blocks 1. TX commits 2. TX fails with PK violation 2. TX does the update (if the error is caught) > There are > transaction isolation levels, but they are like playing with fire. (in > my opinion). You make them sound like witchcraft. But they are clearly defined - even standardized. Granted, different RDBMS might implement them in different ways - here's PG's view of TX isolation: http://www.postgresql.org/docs/8.4/interactive/transaction-iso.html In my opinion anybody working with RDBMS should make himself familiar with this concept - at least know about it - because it is one of the fundamental features of RDBMS and certainly needs consideration in applications with highly concurrent DB activity. Kind regards robert
On Tue, Sep 13, 2011 at 00:26, Robert Klemme <shortcutter@googlemail.com> wrote: > In the case of PG this particular example will work: > 1. TX inserts new PK row > 2. TX tries to insert same PK row => blocks > 1. TX commits > 2. TX fails with PK violation > 2. TX does the update (if the error is caught) That goes against the point I was making in my earlier comment. In order to implement this error-catching logic, you'll have to allocate a new subtransaction (transaction ID) for EVERY ROW you insert. If you're going to be loading billions of rows this way, you will invoke the wrath of the "vacuum freeze" process, which will seq-scan all older tables and re-write every row that it hasn't touched yet. You'll survive it if your database is a few GB in size, but in the terabyte land that's unacceptable. Transaction IDs are a scarce resource there. In addition, such blocking will limit the parallelism you will get from multiple inserters. Regards, Marti
On Tue, Sep 13, 2011 at 5:13 PM, Marti Raudsepp <marti@juffo.org> wrote: > On Tue, Sep 13, 2011 at 00:26, Robert Klemme <shortcutter@googlemail.com> wrote: >> In the case of PG this particular example will work: >> 1. TX inserts new PK row >> 2. TX tries to insert same PK row => blocks >> 1. TX commits >> 2. TX fails with PK violation >> 2. TX does the update (if the error is caught) > > That goes against the point I was making in my earlier comment. In > order to implement this error-catching logic, you'll have to allocate > a new subtransaction (transaction ID) for EVERY ROW you insert. I don't think so. You only need to catch the error (see attachment). Or does this create a sub transaction? > If > you're going to be loading billions of rows this way, you will invoke > the wrath of the "vacuum freeze" process, which will seq-scan all > older tables and re-write every row that it hasn't touched yet. You'll > survive it if your database is a few GB in size, but in the terabyte > land that's unacceptable. Transaction IDs are a scarce resource there. Certainly. But it's not needed as far as I can see. > In addition, such blocking will limit the parallelism you will get > from multiple inserters. Yes, I mentioned the speed issue. But regardless of the solution for MySQL's "INSERT..ON DUPLICATE KEY UPDATE" which Igor mentioned you will have the locking problem anyhow if you plan to insert concurrently into the same table and be robust. Kind regards robert -- remember.guy do |as, often| as.you_can - without end http://blog.rubybestpractices.com/
Вложения
Hi,
> (see attachment)
under high concurency you may expect that your data is already in.
In such a case you better do nothing at all:
begin
select dat=a_dat from t where id=a_id into test:
if test is null then
begin
insert into t (id, dat) values (a_id, a_dat);
exception
when unique_violation then
update t set dat = a_dat where id = a_id and dat <> a_dat;
return 0;
end;
elsif not test then
update t set dat = a_dat where id = a_id;
return 0;
end if;
return 1;
best regards,
Marc Mamin
-----Ursprüngliche Nachricht-----
Von: pgsql-performance-owner@postgresql.org im Auftrag von Robert Klemme
Gesendet: Di 9/13/2011 6:34
An: Marti Raudsepp
Cc: pgsql-performance@postgresql.org
Betreff: Re: [PERFORM] Postgres for a "data warehouse", 5-10 TB
On Tue, Sep 13, 2011 at 5:13 PM, Marti Raudsepp <marti@juffo.org> wrote:
> On Tue, Sep 13, 2011 at 00:26, Robert Klemme <shortcutter@googlemail.com> wrote:
>> In the case of PG this particular example will work:
>> 1. TX inserts new PK row
>> 2. TX tries to insert same PK row => blocks
>> 1. TX commits
>> 2. TX fails with PK violation
>> 2. TX does the update (if the error is caught)
>
> That goes against the point I was making in my earlier comment. In
> order to implement this error-catching logic, you'll have to allocate
> a new subtransaction (transaction ID) for EVERY ROW you insert.
I don't think so. You only need to catch the error (see attachment).
Or does this create a sub transaction?
> If
> you're going to be loading billions of rows this way, you will invoke
> the wrath of the "vacuum freeze" process, which will seq-scan all
> older tables and re-write every row that it hasn't touched yet. You'll
> survive it if your database is a few GB in size, but in the terabyte
> land that's unacceptable. Transaction IDs are a scarce resource there.
Certainly. But it's not needed as far as I can see.
> In addition, such blocking will limit the parallelism you will get
> from multiple inserters.
Yes, I mentioned the speed issue. But regardless of the solution for
MySQL's "INSERT..ON DUPLICATE KEY UPDATE" which Igor mentioned you
will have the locking problem anyhow if you plan to insert
concurrently into the same table and be robust.
Kind regards
robert
--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/
On Tue, Sep 13, 2011 at 19:34, Robert Klemme <shortcutter@googlemail.com> wrote: > I don't think so. You only need to catch the error (see attachment). > Or does this create a sub transaction? Yes, every BEGIN/EXCEPTION block creates a subtransaction -- like a SAVEPOINT it can roll back to in case of an error. > Yes, I mentioned the speed issue. But regardless of the solution for > MySQL's "INSERT..ON DUPLICATE KEY UPDATE" which Igor mentioned you > will have the locking problem anyhow if you plan to insert > concurrently into the same table and be robust. In a mass-loading application you can often divide the work between threads in a manner that doesn't cause conflicts. For example, if the unique key is foobar_id and you have 4 threads, thread 0 will handle rows where (foobar_id%4)=0, thread 1 takes (foobar_id%4)=1 etc. Or potentially hash foobar_id before dividing the work. I already suggested this in my original post. Regards, Marti
On Tue, Sep 13, 2011 at 19:34, Robert Klemme <shortcutter@googlemail.com> wrote:Yes, every BEGIN/EXCEPTION block creates a subtransaction -- like a
> I don't think so. You only need to catch the error (see attachment).
> Or does this create a sub transaction?
SAVEPOINT it can roll back to in case of an error.In a mass-loading application you can often divide the work between
> Yes, I mentioned the speed issue. But regardless of the solution for
> MySQL's "INSERT..ON DUPLICATE KEY UPDATE" which Igor mentioned you
> will have the locking problem anyhow if you plan to insert
> concurrently into the same table and be robust.
threads in a manner that doesn't cause conflicts.
For example, if the unique key is foobar_id and you have 4 threads,
thread 0 will handle rows where (foobar_id%4)=0, thread 1 takes
(foobar_id%4)=1 etc. Or potentially hash foobar_id before dividing the
work.
I already suggested this in my original post.
Regards,
Marti
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Interesting debate. 2011/9/13 Marti Raudsepp <marti@juffo.org>: > Yes, every BEGIN/EXCEPTION block creates a subtransaction -- like a > SAVEPOINT it can roll back to in case of an error. Are you sure? In theory I always understood that there are no "subtransactions". In fact when looking at the docs there is chapter 39.6.6. saying "By default, any error occurring in a PL/pgSQL function aborts execution of the function, and indeed of the surrounding transaction as well. You can trap errors and recover from them by using a BEGIN block with an EXCEPTION clause." (http://www.postgresql.org/docs/current/interactive/plpgsql-control-structures.html ) So the doc isn't totally explicit about this. But whatever: What would be the the function of a subtransaction? To give the possibility to recover and continue within the surrounding transaction? Stefan 2011/9/13 Marti Raudsepp <marti@juffo.org>: > On Tue, Sep 13, 2011 at 19:34, Robert Klemme <shortcutter@googlemail.com> wrote: >> I don't think so. You only need to catch the error (see attachment). >> Or does this create a sub transaction? > > Yes, every BEGIN/EXCEPTION block creates a subtransaction -- like a > SAVEPOINT it can roll back to in case of an error. > >> Yes, I mentioned the speed issue. But regardless of the solution for >> MySQL's "INSERT..ON DUPLICATE KEY UPDATE" which Igor mentioned you >> will have the locking problem anyhow if you plan to insert >> concurrently into the same table and be robust. > > In a mass-loading application you can often divide the work between > threads in a manner that doesn't cause conflicts. > > For example, if the unique key is foobar_id and you have 4 threads, > thread 0 will handle rows where (foobar_id%4)=0, thread 1 takes > (foobar_id%4)=1 etc. Or potentially hash foobar_id before dividing the > work. > > I already suggested this in my original post. > > Regards, > Marti > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance >
On Tue, Sep 13, 2011 at 12:57 PM, Stefan Keller <sfkeller@gmail.com> wrote: > Are you sure? In theory I always understood that there are no > "subtransactions". "subtransaction" is just another way of saying save points / rollback.
On 13.09.2011 20:11, Marti Raudsepp wrote: > On Tue, Sep 13, 2011 at 19:34, Robert Klemme<shortcutter@googlemail.com> wrote: >> I don't think so. You only need to catch the error (see attachment). >> Or does this create a sub transaction? > > Yes, every BEGIN/EXCEPTION block creates a subtransaction -- like a > SAVEPOINT it can roll back to in case of an error. Ouch! Learn something new every day. Thanks for the update! http://www.postgresql.org/docs/8.4/interactive/plpgsql-structure.html Side note: it seems that Oracle handles this differently (i.e. no subtransaction but the INSERT would be rolled back) making the pattern pretty usable for this particular situation. Also, I have never heard that TX ids are such a scarse resource over there. Would anybody think it a good idea to optionally have a BEGIN EXCEPTION block without the current TX semantics? In absence of that what would be a better pattern to do it (other than UPDATE and INSERT if not found)? >> Yes, I mentioned the speed issue. But regardless of the solution for >> MySQL's "INSERT..ON DUPLICATE KEY UPDATE" which Igor mentioned you >> will have the locking problem anyhow if you plan to insert >> concurrently into the same table and be robust. > > In a mass-loading application you can often divide the work between > threads in a manner that doesn't cause conflicts. Yeah, but concurrency might not the only reason to optionally update. If the data is there you might rather want to overwrite it instead of failure. Kind regards robert
On 13.09.2011 20:57, Stefan Keller wrote: > Interesting debate. Indeed. > 2011/9/13 Marti Raudsepp<marti@juffo.org>: >> Yes, every BEGIN/EXCEPTION block creates a subtransaction -- like a >> SAVEPOINT it can roll back to in case of an error. > > Are you sure? In theory I always understood that there are no > "subtransactions". What theory are you referring to? > In fact when looking at the docs there is chapter 39.6.6. saying "By > default, any error occurring in a PL/pgSQL function aborts execution > of the function, and indeed of the surrounding transaction as well. > You can trap errors and recover from them by using a BEGIN block with > an EXCEPTION clause." > (http://www.postgresql.org/docs/current/interactive/plpgsql-control-structures.html > ) > > So the doc isn't totally explicit about this. But whatever: What would > be the the function of a subtransaction? To give the possibility to > recover and continue within the surrounding transaction? I find this pretty explicit: It is important not to confuse the use of BEGIN/END for grouping statements in PL/pgSQL with the similarly-named SQL commands for transaction control. PL/pgSQL's BEGIN/END are only for grouping; they do not start or end a transaction. Functions and trigger procedures are always executed within a transaction established by an outer query — they cannot start or commit that transaction, since there would be no context for them to execute in. However, a block containing an EXCEPTION clause effectively forms a subtransaction that can be rolled back without affecting the outer transaction. For more about that see Section 38.6.5. http://www.postgresql.org/docs/8.4/interactive/plpgsql-structure.html Cheers robert
Robert Klemme wrote: > On 12.09.2011 19:22, Andy Colson wrote: >> There are transaction isolation levels, but they are like playing >> with fire. (in my opinion). > You make them sound like witchcraft. But they are clearly defined > - even standardized. Yeah, for decades. Developing concurrency control from scratch at the application level over and over again is more like playing with fire, in my book. > Granted, different RDBMS might implement them in different ways - > here's PG's view of TX isolation: > http://www.postgresql.org/docs/8.4/interactive/transaction-iso.html Oh, that link is *so* day-before-yesterday! Try this one: http://www.postgresql.org/docs/9.1/interactive/transaction-iso.html > In my opinion anybody working with RDBMS should make himself > familiar with this concept - at least know about it - because it > is one of the fundamental features of RDBMS and certainly needs > consideration in applications with highly concurrent DB activity. +1 Understanding what levels of transaction isolation are available, and what the implications of each are, is fundamental. Just as there are cases where a foreign key constraint doesn't exactly work for what you need to enforce, there are cases where serializable transactions don't fit. But where they do fit, developing the equivalent from scratch all over again is not as safe or productive as using the built-in feature. -Kevin
On Mon, Sep 12, 2011 at 11:26:10PM +0200, Robert Klemme wrote: > You make them sound like witchcraft. But they are clearly defined - > even standardized. Granted, different RDBMS might implement them in > different ways - here's PG's view of TX isolation: > > http://www.postgresql.org/docs/8.4/interactive/transaction-iso.html Even better: PostgreSQL 9.1 (Released yesterday! Fresher than milk...) ships an improved algorithm for serializable transaction isolation level: http://www.postgresql.org/docs/9.1/interactive/transaction-iso.html More info: http://wiki.postgresql.org/wiki/Serializable Cheers, Dr. Gianni Ciolli - 2ndQuadrant Italia PostgreSQL Training, Services and Support gianni.ciolli@2ndquadrant.it | www.2ndquadrant.it