Improvement for query planner? (no, not about count(*) again ;-))

Поиск
Список
Период
Сортировка
От Tobias Völk
Тема Improvement for query planner? (no, not about count(*) again ;-))
Дата
Msg-id 1jxEsu-0bGx160@fwd35.t-online.de
обсуждение исходный текст
Ответы Re: Improvement for query planner? (no, not about count(*) again ;-))  (Francisco Olarte <folarte@peoplecall.com>)
Re: Improvement for query planner? (no, not about count(*) again ;-))  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-bugs

Hello Postgres-Community,

 

 

I’ve got a table games(name1 text, name2 text) with 1.3x10^9 rows consisting of two two text columns for the names of players who’ve played a game, duplicate rows are possible, there’s no primary key since this table was just intended as a temporary storage for my data until further processing.

 

The length of a name is usually not more than 20 characters, shorter most of the time.

I’ve asked postgres to make an unlogged newtable(name text primary key) consisting of the unqiue names and executed:

 

Insert into newtable(name) select name1 from games on conflict do nothing;

(and later on intended to do the same for the second column)

 

However after hours it still wasn’t done, used only 1 cpu core to the max and read with 5 MB/s from my fast SSD.

So I stopped it.

I’ve also tried inserting (select name1 from games union select name2 from games) but it always wanted to do it using sorting.

But either the sorting or the preperations for the sorting were again only done using 1 core to the max and reading with 5 MB/s.

 

Couldn’t find a fast query for my problem.

 

So I wrote a java-program which read the whole table at a fetchsize of about 4 million and inserted the names into a HashSet.

And surprisingly after only a few minutes the program was already 25% done o.O

 

My question is, why isn’t postgres nearly this fast? Why doesn’t it just create a HashSet in RAM and read full speed from the disk?

I even created a hash index but it kept using it’s primary key b-tree and then I read that hash indices somehow don’t support checking for uniqueness.

 

Best regards, Tobi

 


Virus-free. www.avast.com

В списке pgsql-bugs по дате отправления:

Предыдущее
От: Andy Fan
Дата:
Сообщение: Re: Reported type mismatch improperly
Следующее
От: Thomas Munro
Дата:
Сообщение: Re: psql has some accessibility issues on Windows