Обсуждение: settings for multi-language unicode DB
Please could somebody give me advice on settings for PGSQL with a database with UTF-8 strings in a large number of languages? I've had no problems so far in storing/retrieving UTF-8 strings but can't find clear answers to other issues. What locale would be best for sorting this dataset? Does it have to be the same as the one set for the OS (these seem very limited for Solaris)? As well as European accented characters, I need to handle some others used for transcription. I have to produce simple to use, sorted lists by initial letter for users and staff. The database is for a library which takes journals from all over the world and has to store titles in all european langauges (broadly defined) plus western transcriptions of others. We are in the process of moving to unicode. The DBMS currently runs on Solaris 9 with a backup system on Fedora 3 Linux and user/editor access is via servlets (Java). Thanks for any help. Janet Bagg, CSAC, University of Kent, UK
Janet Bagg wrote: > Please could somebody give me advice on settings for PGSQL with a database > with UTF-8 strings in a large number of languages? I've had no problems so > far in storing/retrieving UTF-8 strings but can't find clear answers to > other issues. > > What locale would be best for sorting this dataset? Does it have to be the > same as the one set for the OS (these seem very limited for Solaris)? Yes, right now PostgreSQL uses the operating system to do locale ordering. That might change in a future release. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
I haven't seen anything to indicate that 8.x improves foreign key refererence locking and fixes the foreign key 'deadlock detected' issue. Has that been addressed in 8.1? I sort my records before loading, but am still getting 'deadlock detected' (still running 7.3.4). I have B references C B references A A references C Any new records for C are loaded first (both A and B depend on C). A is loaded next (B depends on A). B is loaded last. Both A and B are loaded in 'Table C' sorted order. However, a deadlock can occur with one process's A records and another's B records, due to the fact that both have a foreign key reference to C. The only way I know of to eliminate the deadlocks without serializing the processes is to remove the foreign key reference from either A or B. However, jeopardizes referential integrity if something goes wrong with the load process. I've never quite understood why a READ of a record with a foreign key reference results in the referenced record being locked with more than a shared lock. Wes
On Fri, 25 Nov 2005, Wes wrote: > I haven't seen anything to indicate that 8.x improves foreign key > refererence locking and fixes the foreign key 'deadlock detected' issue. > Has that been addressed in 8.1? 8.1 should be using the new shared row locks for doing the checks. This should fix the case mentioned. > I've never quite understood why a READ of a record with a foreign key > reference results in the referenced record being locked with more than a > shared lock. Up until now, we didn't have one to get on a per-record basis.
On 11/25/05 2:40 AM, "Stephan Szabo" <sszabo@megazone.bigpanda.com> wrote: > 8.1 should be using the new shared row locks for doing the checks. This > should fix the case mentioned. Hmm. I'm looking in the "What's new" for 8.1 and see that now. I could have sworn it wasn't there the last time I looked. Sorry. > Up until now, we didn't have one to get on a per-record basis. Ok, I guess I'm confused... From the 7.3.2 manual (6.5 updates): "Multiversion concurrency control(MVCC) This removes our old table-level locking, and replaces it with a locking system that is superior to most commercial database systems. In a traditional system, each row that is modified is locked until committed, preventing reads by other users. MVCC uses the natural multiversion nature of PostgreSQL to allow readers to continue reading consistent data during writer activity. Writers continue to use the compact pg_log transaction system. This is all performed without having to allocate a lock for every row like traditional database systems. So, basically, we no longer are restricted by simple table-level locking; we have something better than row-level locking." So, until 8.1 PostgreSQL had "something better than row-level locking" for some things, but no row locking when needed? Or was it row locking is there, but just no shared row locking? Wes
On Fri, 25 Nov 2005, Wes wrote: > On 11/25/05 2:40 AM, "Stephan Szabo" <sszabo@megazone.bigpanda.com> wrote: > > > 8.1 should be using the new shared row locks for doing the checks. This > > should fix the case mentioned. > > Hmm. I'm looking in the "What's new" for 8.1 and see that now. I could > have sworn it wasn't there the last time I looked. Sorry. > > > Up until now, we didn't have one to get on a per-record basis. > > Ok, I guess I'm confused... From the 7.3.2 manual (6.5 updates): > > "Multiversion concurrency control(MVCC) This removes our old table-level > locking, and replaces it with a locking system that is superior to most > commercial database systems. In a traditional system, each row that is > modified is locked until committed, preventing reads by other users. MVCC > uses the natural multiversion nature of PostgreSQL to allow readers to > continue reading consistent data during writer activity. Writers continue to > use the compact pg_log transaction system. This is all performed without > having to allocate a lock for every row like traditional database systems. > So, basically, we no longer are restricted by simple table-level locking; we > have something better than row-level locking." > > So, until 8.1 PostgreSQL had "something better than row-level locking" for > some things, but no row locking when needed? Or was it row locking is > there, but just no shared row locking? The latter, the row locks before were single owner and were such that a second lock request for the same row would wait for the first to be released. Now effectively you have two levels of locks at the row level, the weaker of which conflicts with the stronger but not with itself. The thing about MVCC is that readers do not have to get either lock if they aren't trying to prevent modifications.
On 11/25/05 9:12 PM, "Stephan Szabo" <sszabo@megazone.bigpanda.com> wrote: > The latter, the row locks before were single owner and were such that a > second lock request for the same row would wait for the first to be > released. Now effectively you have two levels of locks at the row level, > the weaker of which conflicts with the stronger but not with itself. Now I understand. Thanks for clarifying. It sounds like there may be some performance improvements that will be beneficial to us also. Hopefully I can convince them to agree to the 8.1 upgrade. Besides verifying our application, it takes a full weekend to do the dump/load - the database is just under a billion rows. Wes
Some time ago I reported a problem where right after an index rebuild vacuum might take a couple of hours, but gets slower and slower at a seeming increasing rate as time goes on. What takes a couple of ours after a full index rebuild now takes 12 hours or more. The problem was determined to be due to the fact that indexes are vacuumed in index order, not in disk storage order. I don't see anything about this in the "What's new" for 8.1. Has anything been done to resolve this? Scheduled index rebuilds are not feasible. I really don't even want to take a chance on starting one. The database is just a hair under one billion rows, and could take the entire weekend or more to rebuild. Wes
Wes <wespvp@syntegra.com> writes: > The problem was determined to be due to the fact that indexes are vacuumed > in index order, not in disk storage order. I don't see anything about this > in the "What's new" for 8.1. Has anything been done to resolve this? No. Avoiding that would require a new approach to vacuum-vs-ordinary-indexscan interlocking, so it won't happen until someone has a Bright Idea (tm). regards, tom lane
On Mon, Dec 12, 2005 at 06:26:37PM -0500, Tom Lane wrote: > Wes <wespvp@syntegra.com> writes: > > The problem was determined to be due to the fact that indexes are vacuumed > > in index order, not in disk storage order. I don't see anything about this > > in the "What's new" for 8.1. Has anything been done to resolve this? > > No. Avoiding that would require a new approach to > vacuum-vs-ordinary-indexscan interlocking, so it won't happen until > someone has a Bright Idea (tm). Plus there is a TODO to only vacuum pages that are known to have dead tuples, which should hopefully mean no more index-scans during vacuum as well. Hopefully this makes it into 8.2... -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
"Jim C. Nasby" <jnasby@pervasive.com> writes: > On Mon, Dec 12, 2005 at 06:26:37PM -0500, Tom Lane wrote: >> No. Avoiding that would require a new approach to >> vacuum-vs-ordinary-indexscan interlocking, so it won't happen until >> someone has a Bright Idea (tm). > Plus there is a TODO to only vacuum pages that are known to have dead > tuples, which should hopefully mean no more index-scans during vacuum as > well. No such luck. You delete any tuples, you need to scan the indexes. regards, tom lane
On Mon, Dec 12, 2005 at 11:09:01PM -0500, Tom Lane wrote: > "Jim C. Nasby" <jnasby@pervasive.com> writes: > > On Mon, Dec 12, 2005 at 06:26:37PM -0500, Tom Lane wrote: > >> No. Avoiding that would require a new approach to > >> vacuum-vs-ordinary-indexscan interlocking, so it won't happen until > >> someone has a Bright Idea (tm). > > > Plus there is a TODO to only vacuum pages that are known to have dead > > tuples, which should hopefully mean no more index-scans during vacuum as > > well. > > No such luck. You delete any tuples, you need to scan the indexes. Even though you can see what the index values were for the now-dead tuple? -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
On 12/12/05 5:26 PM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote: >> The problem was determined to be due to the fact that indexes are vacuumed >> in index order, not in disk storage order. I don't see anything about this >> in the "What's new" for 8.1. Has anything been done to resolve this? > > No. Avoiding that would require a new approach to > vacuum-vs-ordinary-indexscan interlocking, so it won't happen until > someone has a Bright Idea (tm). Any ideas on how I might I reconfigure to mitigate the issue? Separating the most offending indexes to separate drives probably isn't an option. Wes
Wes <wespvp@syntegra.com> writes: > Any ideas on how I might I reconfigure to mitigate the issue? Separating > the most offending indexes to separate drives probably isn't an option. What are you using for vacuum_mem? A larger value should reduce the number of times we have to scan the indexes of a large table. regards, tom lane
Because of our build environment, we need to build PostgreSQL specifying the location of the openssl libraries to insure everyone is using the same libraries, regardless of the system software is built and executed on. At 7.4.5, we used: ./configure --with-openssl=path This no longer works with 8.x. See: <http://archives.postgresql.org/pgsql-committers/2003-11/msg00278.php> What is the correct way to work around this restriction? Set LD_LIBRARY_PATH? This is on a variety of unix platforms. Wes
Wes <wespvp@syntegra.com> writes: > ./configure --with-openssl=path > This no longer works with 8.x. See: > <http://archives.postgresql.org/pgsql-committers/2003-11/msg00278.php> > What is the correct way to work around this restriction? Use --with-includes and --with-libraries as needed. regards, tom lane
On 1/6/06 2:47 PM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote: >> ./configure --with-openssl=path >> This no longer works with 8.x. See: >> <http://archives.postgresql.org/pgsql-committers/2003-11/msg00278.php> >> What is the correct way to work around this restriction? > > Use --with-includes and --with-libraries as needed. That allows me to build, but doesn't set the library search path in the binaries ("-Wl,-rpath /blah/blah" on linux or "-R /blah/blah" for Solaris). This prevents programs from loading, since they can't find the libraries if they are not in the default library search path of the user. Wes
On 1/6/06 2:47 PM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote: >> ./configure --with-openssl=path >> This no longer works with 8.x. See: >> <http://archives.postgresql.org/pgsql-committers/2003-11/msg00278.php> >> What is the correct way to work around this restriction? > > Use --with-includes and --with-libraries as needed. That doesn't get the library paths into the binary. If the libraries are not in the default system search path, the user is screwed. Is there a way to solve this? I think the previous --with-openssl=path set the search path so the libraries would always be found. Wes
Wes <wespvp@syntegra.com> writes: > On 1/6/06 2:47 PM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote: >> Use --with-includes and --with-libraries as needed. > That doesn't get the library paths into the binary. If the libraries are > not in the default system search path, the user is screwed. Is there a way > to solve this? I think the previous --with-openssl=path set the search path > so the libraries would always be found. No, it didn't. Really the best solution to this is to not have commonly used libraries that aren't in the system search path ... regards, tom lane