Re: Performance problem with query

Поиск
Список
Период
Сортировка
От Christian Rengstl
Тема Re: Performance problem with query
Дата
Msg-id 44BE0A36.0AD0.0080.0@klinik.uni-regensburg.de
обсуждение исходный текст
Ответ на Re: Antw: Performance problem with query  (Q <qdolan@gmail.com>)
Ответы Re: Performance problem with query  (Q <qdolan@gmail.com>)
Список pgsql-general
The analyze is from the exact query and i dropped the indexes before the insert as well without imrpvement. The target
tableis as well completely empty and the insert is supposed to write, in this case, more or less 8 million lines in the
table.There is a rule though, because i have inheritance table structure with one master table and around 20 child
tables.

Q <qdolan@gmail.com> wrote on 07/19/06 4:37 am:
>
> On 19/07/2006, at 4:24 AM, Christian Rengstl wrote:
>
>> now finally after a long time i have the query plan for the whole
>> filled table. I hope somebody can explain me why it takes so much
>> longer...
>
>
> These explain analyze results don't appear to be from the queries you
> posted previously. For these results to mean anything you need to
> include the EXACT queries you used to generate them.
>
>> Result  (cost=0.06..254643.06 rows=8044000 width=39) (actual
>> time=0.056..655772
>> .273 rows=8044000 loops=1)
>>   One-Time Filter: ((split_part(($1)::text, '_'::text,
>> 2))::smallint = 1)
>>   InitPlan
>>     ->  Limit  (cost=0.00..0.02 rows=1 width=10) (actual
>> time=0.003..0.003 rows
>> =1 loops=1)
>>           ->  Seq Scan on temp_table  (cost=0.00..194313.00
>> rows=8044000 width=
>> 10) (actual time=0.002..0.002 rows=1 loops=1)
>>     ->  Limit  (cost=0.00..0.02 rows=1 width=10) (actual
>> time=0.006..0.007 rows
>> =1 loops=1)
>>           ->  Seq Scan on temp_table  (cost=0.00..194313.00
>> rows=8044000 width=
>> 10) (actual time=0.004..0.004 rows=1 loops=1)
>>   ->  Seq Scan on temp_table  (cost=0.00..194313.00 rows=8044000
>> width=39) (act
>> ual time=0.002..191672.344 rows=8044000 loops=1)
>> Total runtime: 62259544.896 ms
>
>
> This is the query you want to be interested in, the others took no
> time at all.
>
> As a guess I would say the query is an INSERT INTO ... FROM
> SELECT ... WHERE (split_part(???, '_', 2))::smallint = 1 statement.
> The majority of the time appears to be taken up on the actual INSERT
> and not the SELECT part.
>
> How many rows are in the target table and what indexes does it have?
> Does it have any triggers, check constraints, or rules applied to it?
> All these things can make the insert take longer as the number of
> rows you have already in the table increases.
>
> More than likely you have a target table with a LOT of rows and a
> bunch of indexes on it and your disks are being thrashed because the
> indexes are not able to stay cached in RAM. At this point you should
> ensure your machine is not swapping do disk, and at the very least
> you should go through one of the many tuning guidelines available and
> ensure you have allocated the appropriate amount of memory to
> postgresql for your needs.
>
> You may also want to consider dropping the indexes before you do the
> INSERT and recreate them afterwards.
>
>
>
>>>>> "Christian Rengstl" <Christian.Rengstl@klinik.uni-regensburg.de>
>>>>> 13.07.06 8.37 Uhr >>>
>> Good morning list,
>>
>> the following query takes about 15 to 20 minutes for around 2
>> million lines in the file myfile.txt, but with 8 million lines it
>> takes around 5 hours and i just don't understand why there is such
>> a huge discrepancy in performance.
>>
>> COPY public.temp_table FROM 'myfile.txt' WITH DELIMITER '\t';
>>
>> INSERT INTO public.master(pid,smid, val1, val2, chr)
>>             SELECT pid, smid, val1, val12, CAST(split_part((SELECT
>> chr from public.temp_table LIMIT 1), '_', 2) as int2)
>>            FROM public.temp_table;
>>
>> INSERT INTO public.values(smid, pos, chr)
>>         SELECT DISTINCT smid, pos, CAST(split_part((SELECT chr from
>> public.temp_table LIMIT 1), '_', 2) as int2)
>>         FROM public.temp_table;
>>
>> I came up with this query, because i wanted to use the COPY command
>> to load huge files into the db, but i don't want to load all the
>> columns contained in the file in only one table but copy some of
>> them into one table and some in a second table. As i found out with
>> logging, the data is loaded into temp_table within 15 minutes, but
>> to transfer it from the temp_table toagain only something like 10
>> minutes. Can it be that the cast takes up so much more time than
>> when reading and transferring 2 million lines?
>
>
> --
> Seeya...Q
>
>                 -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>
>                            _____  /  Quinton Dolan - qdolan@gmail.com
>    __  __/  /   /   __/   /      /
>       /    __  /   _/    /      /        Gold Coast, QLD, Australia
>    __/  __/ __/ ____/   /   -  /            Ph: +61 419 729 806
>                      _______  /
>                              _\
>
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster


--
Christian Rengstl M.A.
Klinik und Poliklinik für Innere Medizin II
Kardiologie - Forschung
Universitätsklinikum Regensburg
B3 1.388
Franz-Josef-Strauss-Allee 11
93053 Regensburg
Tel.: +49-941-944-7230

В списке pgsql-general по дате отправления:

Предыдущее
От: "Christian Rengstl"
Дата:
Сообщение: Re: Performance problem with query
Следующее
От: Q
Дата:
Сообщение: Re: Performance problem with query