Обсуждение: INSERTs becoming slower and slower

Поиск
Список
Период
Сортировка

INSERTs becoming slower and slower

От
Nörder-Tuitje, Marcus
Дата:


Hi,

I am breaking up huge texts (between 25K and 250K words) into single words using PgPlsql.

For this I am using a temp table in the first step :

        LOOP   

                vLeft   := vRight;
                vTmp    := vLeft;
               
                LOOP
                        vChr := SUBSTRING ( pText FROM vTmp FOR 1);
                        vTmp := vTmp + 1;
                        EXIT WHEN (vChr = ' ' OR vChr IS NULL OR vTmp = cBorder);
                END LOOP;
                       
                vRight  := vTmp;
               
                vLit    := SUBSTRING(pText FROM vLeft FOR (vRight - vLeft - 1));

                IF (LENGTH(vLit) > 0) THEN
                        WRDCNT := WRDCNT +1;
                        INSERT INTO DEX_TEMPDOC(TMP_DOO_ID
                                                ,       TMP_SEQ_ID
                                                ,       TMP_RAWTEXT)
                        VALUES             (pDOO_ID
                                                ,       I
                                                ,       vLIT
                                                    ); 
                END IF;
               
                I := I + 1;
                vTmp := LENGTH(vLIT);

               
                IF ((WRDCNT % 100) = 0) THEN
                        PROGRESS = ROUND((100 * I) / DOCLEN,0);
                        RAISE NOTICE '[PROC] % WORDS -- LAST LIT % (Len %) [% PCT / % of %]', WRDCNT, vLIT, vTMP, PROGRESS, I, DOCLEN;

                END IF;
                       
               
                EXIT WHEN vRight >= cBorder;
        END LOOP;

The doc is preprocessed, between each word only a single blank can be.

My problem is : The first 25K words are quite quick, but  the insert become slower and slower. starting with 1K words per sec I end up with 100 words in 10 sec (when I reach 80K-100K words)

the only (nonunique index) on tempdoc is on RAWTEXT.

What can I do ? Should I drop the index ?

Here is my config:

shared_buffers = 2000           # min 16, at least max_connections*2, 8KB each
work_mem = 32768                # min 64, size in KB
maintenance_work_mem = 16384    # min 1024, size in KB
max_stack_depth = 8192          # min 100, size in KB

enable_hashagg = true
enable_hashjoin = true
enable_indexscan = true
enable_mergejoin = true
enable_nestloop = true
enable_seqscan = false

The machine is a XEON 3GHz, 1GB RAM, SATA RAID 1 Array running 8.0.4 i686 precompiled

Thanks !

Mit freundlichen Grüßen
Dipl.Inform.Marcus Noerder-Tuitje
Entwickler

software technology AG
Kortumstraße 16  
44787 Bochum
Tel:  0234 / 52 99 6 26
Fax: 0234 / 52 99 6 22
E-Mail:   noerder-tuitje@technology.de 
Internet: www.technology.de

Re: INSERTs becoming slower and slower

От
Christopher Kings-Lynne
Дата:
You might find it faster to install contrib/tsearch2 for text indexing
sort of purposes...

Nörder-Tuitje wrote:
>
>
> Hi,
>
> I am breaking up huge texts (between 25K and 250K words) into single
> words using PgPlsql.
>
> For this I am using a temp table in the first step :
>
>         LOOP
>
>                 vLeft   := vRight;
>                 vTmp    := vLeft;
>
>                 LOOP
>                         vChr := SUBSTRING ( pText FROM vTmp FOR 1);
>                         vTmp := vTmp + 1;
>                         EXIT WHEN (vChr = ' ' OR vChr IS NULL OR vTmp =
> cBorder);
>                 END LOOP;
>
>                 vRight  := vTmp;
>
>                 vLit    := SUBSTRING(pText FROM vLeft FOR (vRight -
> vLeft - 1));
>
>                 IF (LENGTH(vLit) > 0) THEN
>                         WRDCNT := WRDCNT +1;
>                         INSERT INTO DEX_TEMPDOC(TMP_DOO_ID
>                                                 ,       TMP_SEQ_ID
>                                                 ,       TMP_RAWTEXT)
>                         VALUES             (pDOO_ID
>                                                 ,       I
>                                                 ,       vLIT
>                                                     );
>                 END IF;
>
>                 I := I + 1;
>                 vTmp := LENGTH(vLIT);
>
>
>                 IF ((WRDCNT % 100) = 0) THEN
>                         PROGRESS = ROUND((100 * I) / DOCLEN,0);
>                         RAISE NOTICE '[PROC] % WORDS -- LAST LIT % (Len
> %) [% PCT / % of %]', WRDCNT, vLIT, vTMP, PROGRESS, I, DOCLEN;
>
>                 END IF;
>
>
>                 EXIT WHEN vRight >= cBorder;
>         END LOOP;
>
>
> The doc is preprocessed, between each word only a single blank can be.
>
> My problem is : The first 25K words are quite quick, but  the insert
> become slower and slower. starting with 1K words per sec I end up with
> 100 words in 10 sec (when I reach 80K-100K words)
>
> the only (nonunique index) on tempdoc is on RAWTEXT.
>
> What can I do ? Should I drop the index ?
>
> Here is my config:
>
> shared_buffers = 2000           # min 16, at least max_connections*2,
> 8KB each
> work_mem = 32768                # min 64, size in KB
> maintenance_work_mem = 16384    # min 1024, size in KB
> max_stack_depth = 8192          # min 100, size in KB
>
> enable_hashagg = true
> enable_hashjoin = true
> enable_indexscan = true
> enable_mergejoin = true
> enable_nestloop = true
> enable_seqscan = false
>
> The machine is a XEON 3GHz, 1GB RAM, SATA RAID 1 Array running 8.0.4
> i686 precompiled
>
>
> Thanks !
>
>
>
> Mit freundlichen Grüßen
> *Dipl.Inform.Marcus Noerder-Tuitje
> **Entwickler
> *
> software technology AG
> *Kortumstraße 16   *
> *44787 Bochum*
> *Tel:  0234 / 52 99 6 26*
> *Fax: 0234 / 52 99 6 22*
> *E-Mail:   noerder-tuitje@technology.de  *
> *Internet: www.technology.de *
>
>


Re: INSERTs becoming slower and slower

От
Matteo Beccati
Дата:
Christopher Kings-Lynne wrote:
> You might find it faster to install contrib/tsearch2 for text indexing
> sort of purposes...
>
> Nörder-Tuitje wrote:
>> Here is my config:
>>
>> shared_buffers = 2000           # min 16, at least max_connections*2,
>> 8KB each
>> work_mem = 32768                # min 64, size in KB
>> maintenance_work_mem = 16384    # min 1024, size in KB
>> max_stack_depth = 8192          # min 100, size in KB
>>
>> enable_hashagg = true
>> enable_hashjoin = true
>> enable_indexscan = true
>> enable_mergejoin = true
>> enable_nestloop = true
>> enable_seqscan = false
>>
>> The machine is a XEON 3GHz, 1GB RAM, SATA RAID 1 Array running 8.0.4
>> i686 precompiled

Also, shared_buffers (server-wide) are low, compared to a high work_mem
(32M for each sort operation, but this also depends on your concurrency
level).

And disabling sequential scans in your postgresql.conf would probabily
lead to sub-optimal plans in many queries.


Best regards
--
Matteo Beccati
http://phpadsnew.com
http://phppgads.com