Re: Error while loading sql file

Поиск
Список
Период
Сортировка
От Bèrto ëd Sèra
Тема Re: Error while loading sql file
Дата
Msg-id CAKwGa_96Y6c9GPNX3Ed7GQjH3jH5PmTm8_g-YUQgKhBsECiNsQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Error while loading sql file  (Adarsh Sharma <adarsh.sharma@orkash.com>)
Список pgsql-general
HI Ardash!

INSERT INTO conceptnet_frame VALUES(3884,'ja','{1}は{2}を持っている。',16,3,2140,NULL,NULL,NULL);
....

Can you still access the database that produced the dump? If so, you may want to produce a number of dumps for distinct language values. Japanese, in particular, is a very complicated issue, as Japanese change alphabet to produce the same effect Latin writers get by using bold and italic. So text searches are quite a nightmare, since DOG is written in different alphabets (Latin included), depending on what the author had in mind, in terms of underlining this specific word/expression.

If you could manage to insulate single languages things would definitely get less complex, as you can track down specific problems related to a single language (or maybe to an external source that inserted weird encodings). If the original DB is not available you could still filter the dump file for such values (although this is a very boring experience). 

One of the good things with this approach (in case you need to do heavy text searches on your data) is that (as of PG 9.1) you can model structures like:

CREATE COLLATION "de_DE.utf8" (
    LC_COLLATE = "de_DE.utf8", 
    LC_CTYPE = "de_DE.utf8" 
);
CREATE COLLATION "en_GB.utf8" (
    LC_COLLATE = "en_GB.utf8", 
    LC_CTYPE = "en_GB.utf8" 
);
# NOTE!! all these locales must pre-exist on your box, you are simply importing them into PG with this CREATE phase.

CREATE TABLE conceptnet_frame_root (
  id            BIGINT NOT NULL PRIMARY KEY,
  ISO693_1_code CHAR(2) NOT NULL
);

CREATE TABLE conceptnet_frame_en (
  linguistic_content TEXT COLLATE "en_GB.utf8" NOT NULL 
) INHERITS (conceptnet_frame_root);

CREATE TABLE conceptnet_frame_de (
  linguistic_content TEXT COLLATE "de_DE.utf8" NOT NULL
) INHERITS (conceptnet_frame_root);

Then have a view pick up the entire linguistic bouquet, if needed. Sadly you cannot leave 'linguistic_content' in the root table, because you won't be able to use an alter table on it later, to alter the collation at single inherited table level, so you are responsible of ensuring type consistency on your own. It does add complication to the model, however, if you are into serious conflicts among different languages this is the best solution I could come up with, thus far.


BTW, ISO 639-1 is a dangerous standard to use, if you are to make extensive language coverage, see

Hope this helps
Bèrto

--
==============================
If Pac-Man had affected us as kids, we'd all be running around in a darkened room munching pills and listening to repetitive music.

В списке pgsql-general по дате отправления:

Предыдущее
От: robins.tharakan@comodo.com
Дата:
Сообщение: Re: Why does index not use for CTE query?
Следующее
От: Filip Rembiałkowski
Дата:
Сообщение: Re: Show in psql does any calculations?