Re: Index creation takes more time?

Поиск
Список
Период
Сортировка
От Herouth Maoz
Тема Re: Index creation takes more time?
Дата
Msg-id D43CCEEB-E655-4BBE-B037-D046B0BF2889@unicell.co.il
обсуждение исходный текст
Ответ на Re: Index creation takes more time?  (Jeff Janes <jeff.janes@gmail.com>)
Список pgsql-general
On 18/09/2012, at 20:19, Jeff Janes wrote:
> I think the one below will show an even larger discrepancy.  You are
> doing 2 casts for each comparison,
> so I think the casts overhead will dilute out the comparison.
>
> select count(distinct foo) from  ( select cast(random() as varchar(14)) as foo
>   from generate_series (1,100000000)) asdf;

Actually, it doesn't. I suspect that it doesn't actually do string comparison per se. I don't know how "distinct" is
implementedin PostgreSQL, but if it was me, I'd implement it with a hash table, which means that you calculate the hash
ofthe string rather than compare it. Even if it is done with actual comparison, I don't think it's a collation-based
comparison,but rather a byte-by-byte comparison. 

>
>
>> Finally, I created a test table, as you asked:
>>
>>
>>> create table foo as select msisdn,sme_reference from
>>> sms.billing__archive limit 1000000;
>>
>> Then I created an index on the msisdn and sme_reference columns together.
>> 99% of the data in the msisdn field consist of 11-digit phone numbers.
>> Result:
>>
>> PC: 5792.641 ms
>> Server: 23740.470 ms
>>
>> Huge discrepancy there.
>
> try:
> create index ON foo (msisdn COLLATE "C", sme_reference) ;
>
> This can only be done on 9.1 server, as that feature is new to that
> release.  It should be much faster to create than the index with
> default collation.
>
> (or change the collation of msisdn column definition, rather than just
> in the index).
>
> This assumes you just need the index for equality, not for some
> precise locale-specific ordering (which for phone numbers seems like a
> safe bet).


Yes, this certainly reduced the index creation time to within a reasonable margin. OK, now we have to decide whether to
movethe entire database to the 'C' collation (which would require, I suppose, a dump and restore) with the option of
changingcollation for specific columns that actually need it, or to just solve the current problem by changing the
indexcreation commands where relevant. 

Thank you very much for your help with this issue, your input has been invaluable.

Herouth

В списке pgsql-general по дате отправления:

Предыдущее
От: Raymond O'Donnell
Дата:
Сообщение: Re: Column aliases in WHERE clauses
Следующее
От: "Marc Mamin"
Дата:
Сообщение: drop table if exists ;